Other Data Migration from EXT4 to ZFS

Hi all

I am a Linux user and trying to switch my own server (which mounts 4TB x4 HDDs as my NAS) from running Debian 11 to FreeBSD 13.1 (I am quite new to BSD, as well as to this forum).

So my server has 1 SSD (500GB) as the system disk and 4 HDDs (4TB each, let's call them HDD1 to HDD4), and I do have another spare and empty 4TB HDD (let's call it HDD0), which can be installed and mounted to my server. I just want to double confirm with experts here it would be reliable to do the following steps to migrate all my data from current EXT4 to ZFS

1. Install FreeBSD on the SSD, as well as ext2fs or ext4fuse or ...? (which is better/more reliable? - I think I'd only need EXT4 READ)
2. Format HDD0 with ZFS and mount it
3. Mount HDD1 (ext4) and copy all data from HDD1 to HDD0
4. Format HDD1 to ZFS and mount it
5. Mount HDD2 (ext4) and copy all data from HDD2 to HDD1
6. Repeat this on HDD3...4...

IMHO it will work without issues (?), but since this has never been tried myself and as it involves lots of my data, I'd just like to hear your opinions/suggestions before the actual execution, and anything else that I should pay attention to?...Thanks very much!
 
Why not put zfs on the hard disks while still on your linux machine? Copy all the data. Export the zpools. Physically move the hard disks to your Freebsd machine (or install it instead on the old one), and import the zpools.
 
The main question is: how important is data on those disks for you? Do you have any cold (offline) backups of those data? What happens if you lose it?
If you care for that data do the full cold backup of it somewhere. Otherwise it may be painful.

You didn't say how are you going to "format" those disks as zfs. Are you going to create new pool or expand the current one? Latter approach can be a problem once that disk fails. I'd strongly recommend to go with some raidz variant for fault tolerance though.
 
4TB each, let's call them HDD1 to HDD4
How much data is actually being used? I assume not all disks are 100% in use? Maybe get one big spare disk that can hold all the data? Then as _martin suggests I would take those 4 drives and create a single RAID-Z pool with them. That gets you around 12TB of usable space with one disk for redundancy.
 
I am quite new to BSD, as well as to this forum
Welcome!

Install FreeBSD on the SSD, as well as ext2fs or ext4fuse or ...? (which is better/more reliable? - I think I'd only need EXT4 READ)
The built-in ext2fs(5) should be fine for reading. No need to install anything.
I've never had problems with it in my occasional usage. Others have, although I think those mostly occur in rw mode:

ext4fuse is deprecated. sysutils/fusefs-ext2 is an alternative.
I haven't used it much but I don't think there's much point in using it instead of ext2fs nowadays.

Unless needed (f.e. with Btrfs) I personally discourage the use of sysutils/fusefs-lkl which has serious issues (for example PR 265202).
 
IMHO it will work without issues (?), but since this has never been tried myself and as it involves lots of my data, I'd just like to hear your opinions/suggestions before the actual execution, and anything else that I should pay attention to?...Thanks very much!
It *MIGHT* work, if the tools you use are perfect enough, and if you make no mistakes.

First question here: Are the tools good enough? Ext4 on Linux is an excellent file system, highly reliable, and for normal use de-facto bug free. So you can trust your current disks to hold the data you want. But that's all that you can trust. The ext4 implementation on FreeBSD is much less perfect. I think for read-only use it MIGHT be ok, but it is likely to have issues in corner cases, or be unreliable. On the receiving end, ZFS is also an excellent file system, just as reliable and bug free as ext4.

Second question: Are you super experienced, use best practices (like do a full backup beforehand, test your procedure on a disposable system, execute your procedures from an automated system like a script so no mistakes can sneak in), and know what to do if things don't go according to plan? If no, then the most likely cause of data loss in this scenario is the human operator, meaning: you.

To make matters more complex: A file system holds much more than the content of files (streams of bytes). It also holds lots of metadata, and that is more fragile. For example: Have you thought through whether all your file names can be moved to ZFS (there are questions such as maximum file name length, and encoding including in particular Unicode characters). Do you have owner and group configuration set up? Depending on how you perform the copy, it might copy numeric UID/GID (in which case you probably want the same passwd and group entries), or it might copy user/group names (which can lead to interesting results), or it might deliberately wipe out the ownership. Do you have interesting flags set (like "archive" and "do not modify"), and have you checked that those flags are handled the same way in both systems (they probably are not). Do you use extended attributes, and if yes, will the copy tool honor them? Do you have sparse files (files that have large holes consisting of logical zeros, causing the disk usage and logical size to be very different), and do you need to rely on them remaining sparse?

Here would be my advice:
  • As SirDice already said, before you do anything, do an incredibly good backup. Or make the conscious decision that your data is transient, and has a good chance of vanishing.
  • Do not use "foreign" file systems, such as the FreeBSD ext2/3/4 implementation. Instead perform the reading of the Linux disks on Linux. This means that you either need to OSes running (so you can write ZFS from FreeBSD), or you need to use the Linux OpenZFS implementation to perform the initial ZFS write. I have never used OpenZFS on Linux, but I hear that it works well. Doing the copy that way is probably the easiest.
    To have two OSes running, you might do the following: On your Linux machine, set up a VM running FreeBSD, and use that VM to write ZFS. Then use a network-based copy tool (for example rsync) to perform the copy between the two logical machines. After you're done, destroy the Linux install, and re-install FreeBSD natively.
  • Personally, I would leave the old disks alone, and not reformat them. That gives you a backup copy, and removes a lot of risk. Unfortunately, it implies obtaining more disk space. Most likely, going from 4x 4TB disks to a single 16TB disk is the easiest.
  • Ideally, set up a second machine using FreeBSD and a new blank disk (large enough) is the easiest and safest. Then leave the old Linux machine intact, and keep it around as a backup, until you are certain that the copy succeeded, and you're sure the old data is no longer needed.
 
I agree with ralphbsz. If you value your data, keep it (or a full backup, including all the metadata) safe during the transition.

And you probably have to acquire extra storage to achieve that. But you can re-deploy those extra resources for redundancy and backup...

There is a multitude of ways to proceed. I think that the safest compatible file system option is to install OpenZFS on Debian 11, and get one large new disk for a ZFS tank (consider USB3 for portability). Check that the Debian ZFS tank pool options set are compatible with FreeBSD, and then copy the ext4 data across to the new ZFS tank using something like rsync(1) (but choose your options with great care, consider "SHAXax"). You could then export the tank, preserve your Debian root SSD and ext4 4 TB disks, install FreeBSD (I'd get a second 500GB SSD for a ZFS root), and import the newly created tank.

When you are completely happy it's all working, you can install all the 4 TB disks, create a new RAIDZ1 tank, and zfs-send (8) the large single-disk tank to them. Then re-deploy the large disk for off-site backups. The old Debian SSD can be added to mirror the existing FreeBSD ZFS root.
 
Thank you very much on all your comments and advice, especially ralphbsz . I guess I'll just do more "practices" on FreeBSD on my other machine, and experience more especially on ZFS before the data transition :)
 
Back
Top