Copying a large file from an USB 3 disk (NTFS or EXT4) to another one (UFS/FreeBSD) is a very slow process....

Hello to everyone.

I'm copying a large file from a NTFS formatted disk to another one,UFS/FreeBSD disk,both are removable disks attached to the USB 3 port. The file is 200 GB large and in 4 hours only 13 GB have been transferred. Why it is so slow ? I don't know where to store my virtual machines. I tried to save them on the ext4 disk because I wanted to share them easily between Linux and FreeBSD but I've realized that when I mount the disk in FreeBSD after some time it corrupts. I tried to store it on the NTFS disk but it happens the same. So,now I'm on FreeBSD and I'm copying them to a dedicated UFS/FreeBSD style disk,but as I said,the speed is very slow. How can I increase the speed ?. Actually I'm using this command,because I want to resume the uploading if it breaks at some point :

Code:
rsync -avxHAX /mnt/da0p1/Backups/OS/bhyve/Ubuntu/impish-cuda-11-4-nvidia-470.img .

and : where do you save large files ? what's the procedure that you use to copy large files with a decent speed ? Unfortunately under Linux is not safe to use a RW ufs disk access. So,I'm out of solutions.
 
I'm not satisfied :

Code:
root@marietto:/mnt/da3p2/bhyve/Ubuntu # rsync -avAXEWSlHh /mnt/da0p1/Backups/OS/bhyve/Ubuntu/im* . --no-compress --info=progress2

sending incremental file list
impish-cuda-11-4-nvidia-470.img

2.13M   0%    9.49kB/s 6284:55:38
 
yes ok. anyway it is slow. Do you know a method to increase the speed ? using a network file system will increase it even if the file is stored on a NTFS partition ?
 
The question is: What is your bottleneck? Have you done any measurements of IO and CPU usage? Like running top or iostat?

13 GB in 4 hours = 0.9 MB per second. That is indeed laughably slow.

Let's see: Could it be USB? No, I can transfer data from an external USB-3 disk at very accurately 100 MB per second (simple: dd if=/dev/da0 of=/dev/null bs=1048576 count=1000). Now, you might be stressing your USB subsystem twice (once reading, once writing), but (a) the bottleneck should be the individual USB port, (b) the theoretical speed of USB-3 is 600 MB/s, and (b) your speed is nowhere near half of these.

Could it be the disk itself? No way. Modern nearline disks run at 150-200 MB/s, consumer disks even on the inner tracks at 100 MB/s, and the slowest archival disks are still around 80 MB/s. SSDs are faster. You are a factor of ~100 lower. So that's not it.

The OS? No, UFS on a modern CPU (anything with more than 1 GHz) can easily handle 100 MB/s. I have no idea about the FreeBSD NTFS implementation, so that could be it, in particular if it uses FUSE, but it is still a factor of many below reasonable speeds.

So, you'll have to both simplify and then measure. First question: Why are you using rsync? That's a hugely complicated and powerful program, but here it's being asked just to do a trivial copy. You could just do "cp /mntX/.../infile /mntY/.../outfile", and that would be much easier. Matter-of-fact, from a performance standpoint, it would probably be best to use an even simpler copy program: "dd if=/mntX/.../infile of=/mntY/.../outfile bs=1048576", and let it run. One of the beautiful things about dd is that you can hit control T while it is running, and it will give you a progress report. See how fast that goes, that eliminates rsync itself being the bottleneck, or rsync doing something with its IO pattern that makes the rest of the system run slow. Keep taking measurements of system behavior while this is running, with iostat (to see how much IO is actually getting done) and top (to see whether your CPU or something similar is being overloaded).

If that doesn't give you reasonable speed (you ought to be running at dozens of MB/s, in a well-built system at around 100 MB/s), then you need to partition the problem, into reading the input file, and writing the output file. So run the following commands: "dd if=/mntX/.../infile of=/dev/null bs=1048576" and see how fast that runs. This tests just reading the input file, then throwing the content away. And afterwards, "dd if=/dev/zero of=/mntY/.../outfile bs=1048576 count=200000", that writes just the output file (it will write 200 GB, you may want to stop once you have speed measurements).

To your question about using a network: Yes, that is likely to work better than your current setup, if you have reasonable network hardware. For example, GigE (gigabit ethernet) can sustain about 80 to 100 MB/s, so you could attach your NTFS disk to a Windows server (because Windows' NTFS implementation is excellent, duh) and the UFS disk to a FreeBSD machine, and go over a network (like NFS or Samba) in between. This makes the setup more complex and harder to tune though.

Note: Why did I pick "size=1048576" in the dd commands? I just want a large number, large compared to the 4096-byte block size of disks and page size of VM systems in the file system, so I picked exactly 1 MiB = 1024 KiB = 256 pages or blocks = 2048 old-fashioned 512-byte disk sectors.
 
mount the usb disk readonly
make sure you have no other process using the usb disk
cat the source file to /dev/null and see what throughput you get
if performance is still bad you may be better by attaching the usb disk to windows vm and [p]scp/ftp the file from the windows side to the bsd host
 
Back
Top