ZFS Plan and best practices to migrate from TrueNAS Core to pure FreeBSD

Shiunbird · Apr 15, 2024

Hello everyone,

I am not planning to migrate to TrueNAS SCALE as my whole environment (desktops, servers and laptops) run FreeBSD, with the sole exception of my web browsing laptop, that is a corebooted QubesOS laptop.

The current setup is 2 servers with 8 disks each, in RAIDZ2, a solid setup I have on FreeNAS/TrueNAS for 10+ years. The active server zfs sends/receives to the backup server and the whole setup is on jails, except my email server that is on mailcow, with an active bhyve VM in one server, using the built-in replication script from mailcow to replicate to the standby VM running on the second server. The whole setup is backed up to Amazon Glacier (still haven't had the cash to get a server up at one of my friend's).

I have decided to start migrating to pure FreeBSD, so I nuked the standby server and started by trying to migrate the standby mailcow bhyve VM. So with the server fresh, I zfs send/received the dataset of my standby VM to the FreeBSD setup, loaded the key and spent 4 hours trying to get the VM to start, to no avail (bhyve error messages are really not that detailed).

So I thought safer to ask for some advice. I am currently on a single TrueNAS CORE box and I would like to migrate the setup to two FreeNAS boxes with HAST. I won't fight too much migrating bhyve, so I can just build a new VM for the email server.

My initial plan is to:
1. zfs send | receive all the data from CORE to the pure freebsd box.
2. Stop services.
3. Resend.
4. load keys.
5. Recreate my jails and the VM in the freebsd, so I have the services back online.
6. Get the new box to be responsible for Amazon Glacier backups.
7. With everything running fine, nuke the TrueNAS CORE installation, install freebsd.
8. Load keys back (all data should be there).

My question is... in this plan, when would I set HAST to minimize data transfers and reduce data loss risk (I'd hate to restore from glacier)? Should I just instead nuke the CORE box, do a fresh install, load keys? Any tips? Or is my plan awful? =)

After everything is running, I have 5 dell r220ii that I will add to my home rack so the storage boxes will do only that and run no services.

Thanks in advance!

tanis · Apr 15, 2024

I’m no expert in TrueNAS ( looked into it years ago and went with FreeBSD instead).

I would recreate everything from scratch using ansible (creating playbooks aka documentation) along the way. As you mentioned you are running FreeBSD everywhere else, the learning curve shouldn’t be that steep.

I would setup the former passive cluster node as the new active node, which gives the opportunity of testing everything compared to the still running active node and fall over gracefully at the end to the new node, when everything is working like a charm. Setting up the now leftover former active node using ansible to be the new passive (standby) node should be a piece of cake and be done in a fraction of the time.

tingo · Apr 16, 2024

Tip: in case you haven't tried it already: use something like sysutils/vm-bhyve to manage bhyve vm's instead of trying to run it "bare".

cy@ · Apr 16, 2024

I think this is worth a PR.

cy@ · Apr 16, 2024

tingo said:
Tip: in case you haven't tried it already: use something like sysutils/vm-bhyve to manage bhyve vm's instead of trying to run it "bare".

If this is a FreeBSD bug, which is unknown at the moment, I don't see how this would resolve anything. We don't have enough information to determine that yet.

Writing terabytes to a zpool shouldn't invalidate the pool. The OP may have tickled a bug by writing a lot of output to the zpool.

The other thing we don't know is, are they running a 64-bit version of FreeBSD or 32-bit? Seems to me that there may be an integer overrun that might be hiding in ZFS somewhere.

tingo · Apr 16, 2024

cy@ said:
Writing terabytes to a zpool shouldn't invalidate the pool. The OP may have tickled a bug by writing a lot of output to the zpool.

I agree. But do we know that the pool is the problem?
As far as I read the OP's post, the problem is with starting the vm. He doesn't say anything about the health of the pool / replicated dataset.
Perhaps providing more info on that part would narrow down the possible problem area.

cy@ · Apr 16, 2024

tingo said:
I agree. But do we know that the pool is the problem?
As far as I read the OP's post, the problem is with starting the vm. He doesn't say anything about the health of the pool / replicated dataset.
Perhaps providing more info on that part would narrow down the possible problem area.

We don't know. But, opening a PR will get someone with enough ZFS knowledge to look at it. Speculation here will not solve anything. Writing terabytes corrupting a zpool has a smell of serious to me.

Shiunbird · Apr 17, 2024

I kinda hoped I was doing something stupid. This escalated way beyond I expected!

I did another test, though. I have a dataset for my nextcloud data, and I mount it into a jail, where I run nextcloud from.
So I did the same power combo zfs send | receive over ssh, accounting for encryption, etc and did a quick SHA256 of a bunch of files and everything I checked went correctly.

My ignorance kicks in when it comes to the bhyve image. I know it is there because it is listed as a dataset, but it does not show in the filesystem (if I ls, for example). How can I check that the data transfer went correctly?

With that ruled out, I (we, with your kindness) can rule out my bhyve stupidity in a different thread. I will also try vm-bhyve.

tanis: to address one thing: I was looking for a smooth transition into a HAST active-active kind of scenario. Is this possible? If not, then my life is easy - I will move all the data over, set up the 3 physical machines I am waiting to add to my rack (3 for compute + the 2 existing ones for storage)

tanis · Apr 18, 2024

Shiunbird said:
USER=52874]tanis[/USER]: to address one thing: I was looking for a smooth transition into a HAST active-active kind of scenario. Is this possible? If not, then my life is easy - I will move all the data over, set up the 3 physical machines I am waiting to add to my rack (3 for compute + the 2 existing ones for storage)

I guess it depends on which layer you define the active-active kind of scenario.

I suggest the following links:

HighAvailableStorage - FreeBSD Wiki
ZFS High-Availability NAS in combination with FreeBSD Cluster with Pacemaker and Corosync

I'm particular intrigued by the ZFS High-Availability NAS, but it seems like dual port SSD/NVMe drives are a specific HP product, please correct me anyone, if I'm wrong here.

Edit: I was wrong or perhaps just to young to know the details of SAS development. ? All SAS hard drives manufactured after 2012 are most likely dual port, before 2012 there had been single and dual port, these days only dual port is a thing. There is also NL SAS. At last, It’s definitely not an HP thing.

Shiunbird · Apr 18, 2024

UUU
I own a Sonnet SAS enclosure that supports dual port disks, but no, I am not going to go this way - expensive.

The first document sounds more like what I am trying to achieve. Perhaps the way will be to just prepare the new server, then connect my SAS enclosure to the new server, import everything, then nuke the old server (the TrueNAS one), prepare it and finish the HAST configuration.

And I have a 3rd spare enclosure that I don't use anymore (noisy fan) and I can create a backup there. I hope I have enough hard drives. =)

I might - I tend to pull them out every 45000 hours regardless of health.

Shiunbird · Jul 2, 2024

All right, folks. Reporting back on this.

This is taking me way longer than I thought, just because I brainfarted really hard.
After moving 10TB of data 3x and having to redo it because the destination storage had already become obsolete, I decided to move the services first. So this is how it is going so far:

0. Set the network and tested lagg (active-standby on 2 Cisco SG200 switches). I also set up powerd to restrict clock to 2600 (and keep CPU around max 30W) because I modified my servers (tale for another post) and my rack is very shallow.

1. It was easy to move the VMs with vm-bhyve. Thanks for the suggestion. There were two VMs to move: mailcow and my haproxy (I had been using a Linux VM for that because of keepalived). Mailcow has a built-in transfer script, so it was piece of cake. HAProxy I just reconfigured. Setting up the VMs with the same MAC address spared me from having to reconfigure my network stack. Previously, I had them in the storage pool but since I am at an intermediate phase of my setup, they went to zroot.
I had to invert my mailcow setup, making the new server primary and keeping the TrueNAS Core VM as standby.

2. With that done, I configured email, in order to to be able to receive the periodic reports and not have them littering the local mail folder. It was piece of cake. I followed the instructions in the handbook and everything worked perfectly.

3. Then I configured smartd. I just had to follow the sample config file to be able to have the exact same setup I had in TrueNAS Core. Email notification also worked great.

4. Basic cron setup: scrubs, pkg updates, etc..

I let it sit for a week.

5. Installed iocage also to zroot. I will move it to the storage pool afterwards.

6. Installed and configured the web server (freenginx, certbox, php, caching, yada yada). I thought of just sending over my previous apache jail, but I wanted to migrate to freenginx, so I reconfigured 10 web pages in a new jail in FreeBSD. Previously, nextcloud data was just a nullfs mount but since the data was staying in the TrueNAS box, I set up NFS. With the help of Mozilla Observatory, I got all the security details set just as I had with Apache. Certbox also worked out-of-the-box, including setting separate conf files in the includes folder.

I also had to go to my MariaDB install in the TrueNAS jail to create new users set to connect from the FreeBSD jails. There was no need to reconfigure HAProxy, because I also used the same MAC address as the old jail, so everything worked fine out-of-the-box.

I let it sit for a week.

7. I created a new MariaDB jail. I was not managing to update the one in the TrueNAS jail, so I just dumped the 10 databases 1 by 1 and imported them into the new MariaDB jail. After creating the users, the applications automatically started working again, because I also set the same MAC address.

I let it sit for a week.

8. I created the new pool. Compression, no encryption (I am going to receive an encrypted dataset anyway)

9. Last weekend, I put nextcloud (already in the new FreeBSD server) in maintenance mode and called the classic:
zfs send --raw -R -p sun-nas@auto-2024-06-30_00-00 | ssh root@new zfs recv -v -F uppersas/sun-nas
It took 25 hours and came at cca 750Mbps. CPU was stuck at 25% average of all cores during the time, no change in served application responsiveness.

10. zfs load-key, zfs mount -a, zfs change-key (key saved in /boot/efi for now), enabled zfskeys service. Reboot, all good.

11. Change the mount in my nextcloud jail from remote NFS to local nullfs using iocage. There was no need to chmod anything. The mount path changed (I didn't take the time to figure out how to specify a mount path in iocage), but it is just a quick change to config.php in nextcloud.

12. Configured mail delivery in all jails, basic crontab setup with updates etc.

13. Configured NFS and SMB as I had in TrueNAS. For some reason, manipulating huge photo libraries with thousands of files via SMB is way faster than it was under TrueNAS Core. I am very glad.

Now I have my next challenge... I will let it sit for a week and then I need to set Amazon Glacier backup (very keen into getting rid of it) in the new server and install FreeBSD on the old server, import the ZFS keys back and try to figure out my main challenge: keeping both pools in sync and serving shares in an active-active fashion.

I know this is a long post, but for anyone skipping Linux TrueNAS SCALE, I hope it may help.

Shiunbird · Jul 4, 2024

I've just killed my TrueNAS CORE installation. After more than 10 years, it was very sad to see it go.

The crucial part is to put email failover back up, so I had FreeBSD installed in 10 minutes, loaded the keys for the pool and installed a Debian VM with Docker and keepalived. Mailcow is brilliantly documented, so the whole process didn't take more than 10 minutes, including data transfer.

One thing I struggled a bit to get working was to have both bhyve VMs and jails working together on my LAGG, so here is setup:

In /etc/rc.conf, assuming network interfaces em0 and em1, and failover lagg getting IP from DHCP:

ifconfig_em0="up"
ifconfig_em1="up"
cloned_interfaces="lagg0 bridge0"
ifconfig_lagg0="laggproto failover laggport em0 laggport em1 DHCP
ifconfig_bridge0="addm lagg0 up"

Then I had to create a manual switch using vm-bhyve, in my example it is called public:
vm switch create -t manual -b bridge0 public

And for jails using iocage, I just had to make sure iocage was set to use bridge0.

Tomorrow is a public holiday, and I will move all the services besides nextcloud to a 3rd server, and configure HAST in the two current servers with my pools.

Shiunbird · Jul 5, 2024

My plans are going to have to wait a bit.
I, again, failed to RTFM and yes, I need to assign devices to HAST and then ZFS on top of HAST.

My first server has 8 bays, and they are all full with 4TB disks. The second server has 3 16TB disks (currently acting as primary) and I can get 5 disks more.
I will order more disks for the 2nd server, nuke the first, set up HAST and then slowly move my workloads over. The current pool will remain as a backup. This way, I will be able to introduce HAST without downtime or a risky migration.

Today I put a 3rd server up. Tomorrow I will migrate email VM and web server, basically anything that is not at the moment storing data in my storage pools.

Otherwise, everything is going ok. I am having troubles with CARP, that I am addressing here:

CARP on top of LAGG failover interface - is it possible?

Hello everyone, I am wondering if this is supposed to work: ifconfig_bce0="up" ifconfig_bce1="up" # lagg configuration with DHCP, bridge for jails cloned_interfaces="lagg0 bridge0" ifconfig_lagg0="laggproto failover laggport bce0 laggport bce1 DHCP" # end of lagg configuration with DHCP...

forums.freebsd.org

~~edit: Or should I use ucarp, due to the ability of calling scripts? It seems unmaintained, so I got a bit worried.~~

Shiunbird · Jul 11, 2024

HAST is up and the Galera Cluster is also up (but not hosting data for any applications yet).

I have storage to spare and do not want to go around reloading many TBs of data, so here is what I did:

1. Set up a virtual block device on top of my existing pool with 5TB of capacity at /dev/zvol viazfs create -V 5T mycurrentpool/myhast.
2. Initialize hast.
3. Create a pool on /dev/hast/myhast on the primary.
(do tons of tests)
4. Create a snapshot, do a zfs send | receive. It took a day.
(test handover via CARP - the script found in the article recommended by tanis works perfectly)
5. Create another snapshot, do an incremental send/receive. It took 10 minutes.
6. Put nextcloud in maintenance mode.
7. Do a final incremental send/receive.
8. Swap server roles, as I want the server with more storage redundancy to be primary.
9. Export it via NFS.
10. Replace the nullfs mount in my jail with the NFS export via CARP IP.
11. Disable nextcloud maintenance mode.
The nextcloud outage lasted 5 minutes.

Galera was a pain, but here is what worked for me:
1. Spin up 3 jails.
2. Install mariadb-server in all of them.
3. Get rid of all the sample configuration files. They are all separated, and it made me very confused. Mind that not everything here is necessary for galera.
4. On the first server:
/usr/local/etc/mysql/my.cnf

[client-server]
port = 3306
bind-address = whatever-ip-you-need

[mysqld]
user = mysql
basedir = /usr/local
datadir = /var/db/mysql
net_retry_count = 16384
log_error = /var/log/mysql/mysqld.err

!includedir /usr/local/etc/mysql/conf.d/

/usr/local/etc/mysql/galera.cnf

binlog_format=ROW
default-storage-engine=innodb

wsrep_on=ON
wsrep_provider=/usr/local/lib/libgalera_smm.so
wsrep_cluster_address="gcomm://ip1,ip2,ip3
wsrep_sst_method=rsync
wsrep_cluster_name="clustername"
wsrep_node_address="thisnodeip"
wsrep_node_name="localnodename"

But I started the node manually:
sudo -u mysql /usr/local/libexec/mariadb --wsrep-new-cluster --wsrep-cluster-address="gcomm://ip1" &
Then I went about monitoring the logs at /var/log/mysql/mysqld.err.

For the second node, I changed wsrep_cluster_address to include ip of node1 and itself, and started the service normally.
For the third node, I added all three nodes and started the service normally.

Then I killed the process on the first node, and then started it normally. All good, survived fine multiple reboots in different orders.

My next stop will be to spin up 3 nginx jails and put them behind haproxy, but this will be piece of cake, so I will skip documenting it here. This concludes this saga.

Thanks for all the suggestions!

Shiunbird · Jul 11, 2024

PS: Two adjustments I had to do:
In the carp_up scripts, I had to give more time for the ZFS device to appear, so I changed the multiplier from sleep 0.1 to sleep 0.5

I also had to increase the advskew times, as it was causing a quick flap in the LAGG interface, causing the up/down devd trigger to come, then carp_up would fail.