Solved A faster way to migrate files from ext4 to UFS drive?

I am finally migrating from Ubuntu Linux to FreeBSD!
I recently removed Ubuntu 16 and installed FreeBSD 11 in its place.

I have close to 3tb of data on an ext4 external HDD. My goal is to copy all of its files to my UFS drive.

My Attempt:
I mounted both the ext4 drive (with ext4fuse) and the UFS drive on the FreeBSD server and I am currently using rsync to copy the data over.

The problem is that it is taking a very long time.

A file that is about 200-300mb takes close to 10minutes to fully copy over.

Other Attempt:
The other method I tried was mounting the ext4 external HDD on a PC running Ubuntu 16 off of a Live USB and then using rsync over the network to the FreeBSD server. That also took a long time to transfer files.


Do I just need to be patient?
Is there a faster method of doing this? I still have access to the other PC if necessary.

Edit:
When copying files locally (within my target UFS drive), it is quick.
Only copying files from the mounted ext4 are slow.
 
Debug this separately for reading from the source and writing to the target.

First step: Find a large file on the source. Just read the file, with dd if=/sourcemount/dir/file of=/dev/zero bs=1048576 (I'm doing 1MiB IOs for better performance). Then try writing to your target, with dd if=/dev/zero of=/targetmount/dir/file bs=1048576 count=128. How fast do those run? Do this both with the local source, and over the network. Assuming that the target disk is locally mounted and connected via a reasonable interconnect (SAS or SATA), it should get reasonable close to 100 MB/s (within a factor of two), since it should be purely disk limited. The performance of your source disk connected locally is an interesting question. Going over the network, you can do that test by creating an empty file on the source side and copying that; the performance should be pretty accurately the network performance (~10 MB/s for 100base T, about 100 MB/s for gigabit).

If you get this far, then look at the performance of directories with many small files. But I think you'll find that there will be streaming performance to begin with; 200 MB in 10 minutes is such ridiculous performance (about 300x slower than it should be) that there must be an obvious problem.
 
Not sure if this makes any sense...

tar -C /mnt/ext4 -cf - . | tar -C /mnt/ufs -xvf -

Maybe someone could comment.... Although considering you want to copy 3TB of data, I somehow expect you would hit a problem. Maybe if you created an ext4.tgz file and copied that it would be much quicker... just speculating.
 
If the external drive is connected via USB, expect the performance to be bad on such big transfers - most (all?) of these USB/SATA controllers get really hot really fast and performance drops to ridiculously low speeds. I had external HDD docking stations that even dropped connection every few seconds due to overheating after a few dozen GB of transfer. The transferred data from these docks was also often useless and ridden with errors, so I completely gave up using them.
Try to connect the drive directly to the SATA/SAS controller/backplane if possible - it saves a lot of time and headaches.
 
I don't know if this can be of any help, but FreeBSD has read support for ext4. You could try mounting it with FreeBSD own driver instead of FUSE.
 
Note that the maximum transfer rate of USB 2.0 is 480Mbit/s. Giving you around 48 MByte/s theoretical maximum transfer rate. Expect to see a lot less than that. Copying 3 TB of data will probably take more than 24 hours.
 
Not sure if this makes any sense...
tar -C /mnt/ext4 -cf - . | tar -C /mnt/ufs -xvf -
The problem here is not so much the mechanism of copying itself, but likely the IO speed. He is getting about 0.3 to 0.5 MB/s, and he should be getting around 48 to 100 MB/s (assuming the speed is limited by either the disk drive itself or the USB interface). As far as copying software is concerned, pretty much anything would work.

Having said that, obviously one wants the simplest possible copying program, one that does no format conversion, no memory-memory transfers, and no context switches. From that viewpoint, rsyncis preferable to tar, because rsync can work by allocating a buffer, and then doing a sequence of read() and write() system calls within a single process (perhaps using multiple threads, perhaps using asynchronous IO, but most likely using traditional IO and leaving the prefetch and write-behind to the file system, which are better at it). Using two tar's requires the data to be copied in memory to put it into tar's stream format, and then requires context switches between the two processes connected to a pipe.

But in reality, at the speed we're running at here, that makes little difference. A modern multi-core Intel machine with good PCIe buses can handle about a dozen GB/s (gigabytes), even when doing checksum and parity calculation; at about 100 MB/s we don't need to worry about CPU and memory overhead, unless we're running on a low-end embedded platform.

Maybe if you created an ext4.tgz file and copied that it would be much quicker... just speculating.
Most likely not. Creating a single tar file for the whole file system would mean reading from the source disk, writing to a temporary tar file, then re-reading that temporary file (it is so large it is guaranteed to no longer be in the buffer cache), and the writing to the target location. That's twice as many IOs as a direct copy.

Note that the maximum transfer rate of USB 2.0 is 480Mbit/s. Giving you around 48 MByte/s theoretical maximum transfer rate. Expect to see a lot less than that. Copying 3 TB of data will probably take more than 24 hours.
Right: USB is roughly 48 MB/s (in practice, one would expect a bit less, maybe 30 or 35). But 0.3-0.5 MB/s (which is 200-300 MB in 10 minutes) is right out; at that point the limitation is not the theoretical performance of USB. Even with a USB 1.1 interface (which is theoretically 12 Mbit/s or a little over 1 MB/s) the performance he's seeing is wrong. Something is going wrong here; I like sko's theory of USB interfaces that can't handle streaming.

Best idea: Instead of using USB (if he is, the original post isn't clear), connect the disk directly to the same computer via SATA or SAS, and then use the native kernel ext4 support. Or connect it to a second computer via SATA or SAS, and then use gigabit ethernet (which should give near 100 MB/s). But first measure what's really happening.

PS: Just tried this on my new raspberry pi: I can read large files (using tar and dd) from my SD card (which are famously slow) at 18 MB/s, and write to it at 6 MB/s. That's sort of the worst case for storage and CPU power. So the 0.3 MB/s the OP is seeing is just wrong ... something is borken.
 
Hello all. Thank you for your responses.

My apologies. I made an edit to the original post:
When copying files locally (within my target UFS drive), it is quick.
Only copying files from the mounted ext4 are slow.

So the problem is likely in ext4fuse.

It likely isn't the ext4 HDD itself. I recently wrote to it in a live Ubuntu 16 and its performance was fine.

I will elaborate more below.

Debug this separately for reading from the source and writing to the target.

First step: Find a large file on the source. Just read the file, with dd if=/sourcemount/dir/file of=/dev/zero bs=1048576 (I'm doing 1MiB IOs for better performance). Then try writing to your target, with dd if=/dev/zero of=/targetmount/dir/file bs=1048576 count=128. How fast do those run? Do this both with the local source, and over the network. Assuming that the target disk is locally mounted and connected via a reasonable interconnect (SAS or SATA), it should get reasonable close to 100 MB/s (within a factor of two), since it should be purely disk limited. The performance of your source disk connected locally is an interesting question. Going over the network, you can do that test by creating an empty file on the source side and copying that; the performance should be pretty accurately the network performance (~10 MB/s for 100base T, about 100 MB/s for gigabit).

If you get this far, then look at the performance of directories with many small files. But I think you'll find that there will be streaming performance to begin with; 200 MB in 10 minutes is such ridiculous performance (about 300x slower than it should be) that there must be an obvious problem.

Thank you for this suggestion. After trying this, it confirms my theory since the performance of reading a 200mb is quick. Writing to the Target was also quick.

Result of read (target):
284+1 records in
284+1 records out
298432130 bytes transferred in 2.588727 secs (115281439 bytes/sec)

Result of write (target, locally):
128+0 records in
128+0 records out
134217728 bytes transferred in 1.400968 secs (95803571 bytes/sec)


I don't know if this can be of any help, but FreeBSD has read support for ext4. You could try mounting it with FreeBSD own driver instead of FUSE.

The reason I chose FUSE was because I was unable to get the ext4 drive to mount natively, unfortunately.

My source ext4 drive is in "da0", based on the output of dmesg.

I created a directory in "/mnt" called "12tb" and did the following:
mount -t ext2fs /dev/da0 /mnt/12tb

The result was "mount: /dev/da0: Invalid argument".

Also tried mount -t ext2fs /dev/da0s1 /mnt/12tb

The result was: "mount: /dev/da0s1: No such file or directory"


The problem here is not so much the mechanism of copying itself, but likely the IO speed. He is getting about 0.3 to 0.5 MB/s, and he should be getting around 48 to 100 MB/s (assuming the speed is limited by either the disk drive itself or the USB interface). As far as copying software is concerned, pretty much anything would work.

Having said that, obviously one wants the simplest possible copying program, one that does no format conversion, no memory-memory transfers, and no context switches. From that viewpoint, rsyncis preferable to tar, because rsync can work by allocating a buffer, and then doing a sequence of read() and write() system calls within a single process (perhaps using multiple threads, perhaps using asynchronous IO, but most likely using traditional IO and leaving the prefetch and write-behind to the file system, which are better at it). Using two tar's requires the data to be copied in memory to put it into tar's stream format, and then requires context switches between the two processes connected to a pipe.

But in reality, at the speed we're running at here, that makes little difference. A modern multi-core Intel machine with good PCIe buses can handle about a dozen GB/s (gigabytes), even when doing checksum and parity calculation; at about 100 MB/s we don't need to worry about CPU and memory overhead, unless we're running on a low-end embedded platform.


Most likely not. Creating a single tar file for the whole file system would mean reading from the source disk, writing to a temporary tar file, then re-reading that temporary file (it is so large it is guaranteed to no longer be in the buffer cache), and the writing to the target location. That's twice as many IOs as a direct copy.


Right: USB is roughly 48 MB/s (in practice, one would expect a bit less, maybe 30 or 35). But 0.3-0.5 MB/s (which is 200-300 MB in 10 minutes) is right out; at that point the limitation is not the theoretical performance of USB. Even with a USB 1.1 interface (which is theoretically 12 Mbit/s or a little over 1 MB/s) the performance he's seeing is wrong. Something is going wrong here; I like sko's theory of USB interfaces that can't handle streaming.

Best idea: Instead of using USB (if he is, the original post isn't clear), connect the disk directly to the same computer via SATA or SAS, and then use the native kernel ext4 support. Or connect it to a second computer via SATA or SAS, and then use gigabit ethernet (which should give near 100 MB/s). But first measure what's really happening.

PS: Just tried this on my new raspberry pi: I can read large files (using tar and dd) from my SD card (which are famously slow) at 18 MB/s, and write to it at 6 MB/s. That's sort of the worst case for storage and CPU power. So the 0.3 MB/s the OP is seeing is just wrong ... something is borken.

Unfortunately, I do not have another SATA or SAS available on the computer.
It would seem that ext4fuse is my problem.

Note that the maximum transfer rate of USB 2.0 is 480Mbit/s. Giving you around 48 MByte/s theoretical maximum transfer rate. Expect to see a lot less than that. Copying 3 TB of data will probably take more than 24 hours.
Since I am unable to get the native ext4 reading to work, it appears that you are correct. I think at this rate, it will take several days.
 
Hello all. Thank you for your responses.


The reason I chose FUSE was because I was unable to get the ext4 drive to mount natively, unfortunately.

My source ext4 drive is in "da0", based on the output of dmesg.

I created a directory in "/mnt" called "12tb" and did the following:
mount -t ext2fs /dev/da0 /mnt/12tb

The result was "mount: /dev/da0: Invalid argument".

Also tried mount -t ext2fs /dev/da0s1 /mnt/12tb

The result was: "mount: /dev/da0s1: No such file or directory"

What does gpart show da0 produce?
 
I think the gpart answer from balanga may be the key.

You said above "My source ext4 drive is in da0". That's possible, but unlikely. It would mean that whoever created this file system on Linux created it on the whole disk, without any partitioning. The usual approach is to partition the disk, and then create file systems on partitions. This is usually done today even if the file system spans "the whole" disk, by creating a single large partition (I put "the whole" disk in quotes, because you lose a tiny bit of space to the partition table and to the rounding of the partition boundaries). Even highly capable file- and storage systems that have their own partitioning and labeling systems for disk today often use standard (typically GPT) partition tables, to protect their disks against clueless mistakes (like BIOSes that assume that all disks will have two copies of a GPT table).

My suggestion: Before you try mounting the drive (natively), examine whether it is partitioned tables with fdisk and gpart. If you find an ext4 (or ext2) partition, mount that instead of the whole drive.
 
I ended up finding a faster way of transferring the files. It is a bit embarrassing that I did not realize this.

When using SSH as an option to transfer the files from the Linux PC to the FreeBSD server, I forgot that the Linux PC was on WiFI (which explains the slow speeds).
I instead moved the Linux PC to the router temporarily and wired it to the router.

I started to get close to 20-30 mb/s on each file transfer so it was all a success.

Did not need to worry about mounting the ext4 drive onto my FreeBSD server

I think the gpart answer from balanga may be the key.

You said above "My source ext4 drive is in da0". That's possible, but unlikely. It would mean that whoever created this file system on Linux created it on the whole disk, without any partitioning. The usual approach is to partition the disk, and then create file systems on partitions. This is usually done today even if the file system spans "the whole" disk, by creating a single large partition (I put "the whole" disk in quotes, because you lose a tiny bit of space to the partition table and to the rounding of the partition boundaries). Even highly capable file- and storage systems that have their own partitioning and labeling systems for disk today often use standard (typically GPT) partition tables, to protect their disks against clueless mistakes (like BIOSes that assume that all disks will have two copies of a GPT table).

My suggestion: Before you try mounting the drive (natively), examine whether it is partitioned tables with fdisk and gpart. If you find an ext4 (or ext2) partition, mount that instead of the whole drive.

Thank you balanga and ralphbsz. This is something I did not even think of.

Just for learning purposes (since I am very new to FreeBSD), I will likely try that out even though the problem has been solved.


To everyone - I want to thank you all for chiming in to help a noob like me out. I definitely feel very welcome to the FreeBSD community! You all gave me a great first impression of the community.
 
Back
Top