ZFS migrate a pool from HDD to SSD?

coredumb · Sunday at 8:40 AM

Hi,
running a zfs pool on SSDs is not new anymore, so my question is: are there any things to take into consideration besides replacing one HDD at a time by a new SSD and let zfs resilver, like we would do to replace a faulty drive?

Will zfs take care of TRIM and other ssd related stuff?

It is a raidz2 pool. It is running 24*7.

skeletor · Tuesday at 8:46 AM

I have some experience with replacing HDD with SSD and it was successful and unsuccessful. The unsuccessful result - a lot of zfs ro/rw checksums on pool with a huge RW operations. Why? I'm not sure for 100%, but I suppose because of different block size: old HDD had 512, but new SSD had 4k. The only way in this case is to make another pool and use zfs send/receive with snapshots, then again with diff snapshots.
But, if you don't have a huge operation, you can try.

drr · Tuesday at 11:22 AM

skeletor said:
but I suppose because of different block size: old HDD had 512, but new SSD had 4k

skeletor, sorry to deviate from the main topic of the thread here; does this have to do with disk partitioning? I used to use ssd/nvme with a single GPT partition aligned to 1M given to the zpool before, but recently I started giving the whole disk to zpool, without any partitioning.

skeletor · Tuesday at 11:57 AM

drr said:
skeletor, sorry to deviate from the main topic of the thread here; does this have to do with disk partitioning? I used to use ssd/nvme with a single GPT partition aligned to 1M given to the zpool before, but recently I started giving the whole disk to zpool, without any partitioning.

No. It depends on disk model (512 or 4k block size) and it related to ashift ZFS pool.

Mirror176 · 2025-02-20T01:50:04+0000

I don't know a better way to find proper block size other than testing but I would set it to 4k unless testing shows a need to use something different. SSDs often have a larger size behind the scenes but you don't usually see a performance increase when increasing it and higher block size can have other negative impacts like less free space when filesize/blocksize has a smaller remainder for the smaller blocksize, all reads and writes always have to move more data when the smallest block would suffice, some ZFS metadata gets less entries as block size goes up.

ZFS does not automatically take care of trim until you tell it to automatically take care of trim and some drives + use patterns are better served with the automatic trim while others benefit more from manual trim. Consider retesting what is best for the new drives and if you don't then file it away in your mind in case you think you hit trim performance issues later.

Most SSDs have overprovisioning built into them. You can further expand that to improve drive durability by leaving a partition space unallocated/unused. With ZFS and using trim enough you can also get a similar effect by setting refreservation to help stop a pool from filling by an equivalent amount. Full size partition + refreservation being later removed doesn't have the negative issues a partition resize has (though minor for such a small change). The SSD has wear leveling and ZFS has Copy On Write (COW) which as a side effect is also a kind of wear leveling; not sure when they compliment vs hinder each other as SSD manufacturers don's share with me how their wear leveling works.

ZFS 2.3 has direct-io which helps with high performance drives not bottlenecking with ARC operations and if I recall has a fix that stops precached data always being decompressed so look forward to those if needed in a future ZFS update if you don't have it yet.

If you have the space to add all drives at once then you may want to do so and build the mirror in 1 step or better yet make a new pool and replicate (zfs send+recv) to recreate it fresh.

If unsure, consider creating a zfs checkpoint if possible to avoid any mistake in executed commands. Similarly you can execute most ZFS commands with -n to see what would be done before you rerun it with that part deleted to do it if you liked it. Ann that being said, backups are still the more reliable but slower way to recover.

Argentum · 2025-02-20T14:02:50+0000

coredumb said:
running a zfs pool on SSDs is not new anymore, so my question is: are there any things to take into consideration besides replacing one HDD at a time by a new SSD and let zfs resilver, like we would do to replace a faulty drive?

I have done exactly that several times. No problems. Some claim that it is not a good idea to run HDD-s and SSD-s long time in parallel, but I haven't seen any issues. I have also mirrored a a HDD of laptop with SSD connected over USB. Later removed the HDD from laptop and inserted this fresh SSD. All good.

cracauer@ · 2025-02-20T14:35:19+0000

I am slowly replacing a pool I sporadically have access to with SSDs. It takes months during which it runs with a mix of mechanical drives and SSDs. I see no problems nor would I know why there shouldn't be any.

Jim L. · 2025-02-20T16:42:50+0000

IMO, rather than replacing drives one at a time, I'd create a new pool entirely. Use the correct ashift parameter when you zpool create the new pool, and any other settings changes that you wish you had known about when you created the first pool. Then zfs send the old pool to the new pool, etc.

cracauer@ · 2025-02-20T17:10:34+0000

Jim L. said:
IMO, rather than replacing drives one at a time, I'd create a new pool entirely. Use the correct ashift parameter when you zpool create the new pool, and any other settings changes that you wish you had known about when you created the first pool. Then zfs send the old pool to the new pool, etc.

It's an 8-disk pool. It would have been lots trouble to extend the machine with 8 more SAS ports. Also, when I started I didn't know how long it would actually take.

Jim L. · 2025-02-20T17:19:00+0000

cracauer@ said:
It's an 8-disk pool. It would have been lots trouble to extend the machine with 8 more SAS ports. Also, when I started I didn't know how long it would actually take.

I was thinking of the OP, but your experience is valid.

IANAE, but if the OP does indeed have differing block sizes between the old and new drives, I would highly recommend a new pool and ensuring that drive block sizes, ashift values and partition alignment are solid. My limited experience suggests that ashift=9 on 4k-block SSDs has the potential to create a lot of unnecessary writes. Maybe those will get mitigated by smarts in the SSD, maybe not. The easiest and most certain solution for the OP is probably just to avoid the problem altogether, rather than migrating a bunch of write-limited 4k-block drives into a busy 512-byte pool.