Solved FreeBSD installation with multiple NVME`s and 1 SSD with ZFS.

Looking for some advice on how to set up my workstation/server with ZFS. I have 4 512GB nvme`s inside raid card ( PCIE is x4 x4 x4 x4) and one server ssd(480 gb) on sata connection. Most important part for me is to use my ssd for sql database ( so i need this to be like a separate partition or separated from rest of the nvme`s pool - non mirrored, non raid etc. as i have only one ssd ) , backups of sql data will be on a portable ssd ( and i will perform it manually for time being ). My WS equiped with 32GB ECC memory.
My experience with file systems is limited, on something like Arch or Debian i would just mount my ssd as separate partition but in FreeBSD with ZFS its a bit different and i got lost on Pool Type/Disk selection. Also needs to be fully encrypted ( specially SSD for "just in case" reasons ). And do i need swap for ZFS and 32GB ram or swap is irrelevant at this point ?
Thank You.
 
Whether or not you need swap, and how much you need, depends on your work load. On balance, it's always better to have some because it may save your situation if you get a memory leak, a runaway process, or need a core dump.

The disposition of your NVMe SSDs is unclear. The phrase "inside a RAID card" needs clarification. Does the operating system have unfettered access to each raw device? If not, what?

Why do you not want the database on NVMe? It's faster than SATA SSD, and more amenable to redundant configuration in your case.

If you care about your database, then consider using redundant storage. SSDs of all kinds are known to fail suddenly and completely (especially when subject to the sorts of loads that databases can exert).

ZFS may require tuning for database. There's plenty on that if you search this site.
 
My personal setup was:
-A big zpool on a Spinning-disk
-On the NVME the swap-partition & the special-device,log-device,(read)cache-device of this zpool.
 
Whether or not you need swap, and how much you need, depends on your work load. On balance, it's always better to have some because it may save your situation if you get a memory leak, a runaway process, or need a core dump.

The disposition of your NVMe SSDs is unclear. The phrase "inside a RAID card" needs clarification. Does the operating system have unfettered access to each raw device? If not, what?

Why do you not want the database on NVMe? It's faster than SATA SSD, and more amenable to redundant configuration in your case.

If you care about your database, then consider using redundant storage. SSDs of all kinds are known to fail suddenly and completely (especially when subject to the sorts of loads that databases can exert).

ZFS may require tuning for database. There's plenty on that if you search this site.
Yes, i do have access to each nvme device via bios bifurcation to x4 x4 x4 x4 - so i can set up each nvme as a separate partition if i want to ( dev/nvme0 - rootfs / dev/nvme1 - home etc. )
Why no database in nvme ? First and major issue was with bios settings and RAID card ( BIOS had only x16 option .. and im not able to replicate modified bios for other PCIE slots ) + I need longer life span and robustness. If i recall - my nvme would be reduced to 0 much , much quicker than server ssd due to high flow of read/writes plus i can get better consistency over long period of time etc. - this impression i got after reading internet and deciding what to get. I do care about my database but atm i have only one of these SSD`s i will get few more but now i just have 1 :)
Yes, i do know about tuning for database ive seen few columns already.
Im still grasping over special-device, log-device,cache-device, metadata etc.
If this was my Linux set-up i would use something like this:
/dev/nvme0 --->/dev/nvme0p1 = /efi --->/dev/nvme0p2 = / /dev/nvme1 --->/dev/nvme1p1 = /home /dev/nvme2 --->/dev/nvme2p1 =/programming /dev/nvme3 --->/dev/nvme3p1 =/vhost /dev/sda --->/dev/sda1 = /database --->/dev/sda2 =/var
All this would be with encrypted boot and boot from UEFI.
Im looking to replicate something like this but knowing its going to be mainly work/server only - i would need around 250GB for personal home and programming related stuff.
As i never used swap etc im thinking to utilize my 3 x 110gb kingston ssd`s something like 3-way mirror and have 2 of my ram so it would be 64GB and rest of it - have no idea.
I also have 2 x 250GB SSD`s so in total it makes 4x nvme, 3x 110, 2x 250, 1x 480.
As im trying to understand things i have somewhat concluded to use 2x nvme - 2way mirror, 2x nvme - 2 way mirror, 3x110 ssd - 3way mirror, 2x250 2-way mirror and 1x ssd stripe as if 2 way becomes one ssd due to ssd`s tits up moment it becomes stripe but i can add another ssd and make it 2-way mirror without loosing data ? or dont use my 480gb ssd and just use my 2x250gb ssd for database in 2 way mirror ?
But as i mentioned before - im still grasping metadata,special etc .. i have 0 understanding what is what.
P.s. i do not want to use my database with nvme`s plus its less convenient to work on it as my pc is fully water-cooled and im a bit clumsy i could break something and ill be in a big trouble :) so ssd are the way to go for me :)
P.p.s i want my system to be encrypted. i know how to make it in UFS within shell but not with ZFS as i see i will need to use shell to do all my work :)
P.p.p.s for time being - only me will be accessing database but within few months - around 50 people will be using it 24/7 but data will contain mostly text no images.
Thank You.
 
50 people using the system 24x7 suggests to me that you need a highly robust system which is fault tolerant.

I don't understand your arguments to use SATA SSD for the database. NVMe is simply superior. Which ever media you use for the database, it will wear out faster, assuming its more heavily used. I don't believe that wear rates on NVMe SSDs are significantly different to SATA SSDs -- there's a lot of variation, and you get what you pay for.

Since you can address the NVMe SSDs individually, your sensible choices for redundancy with ZFS would be to create two mirrors. You could choose to stripe those mirrors for one big pool, or deploy them as two separate pools. Alternatively, configure the four NVMe SSDs in a single RAIDZ1 pool. RAIDZ1 gives you more capacity, but is slower at writing. My choice would probably be to stripe two mirrors (best performance, maximum shared headroom). But I would want to have the backup plan written before making the decision.

One important aspect of FreeBSD that is worth remembering is that any GEOM provider can be used to furnish a ZFS vdev. So you could create a GEOM concat with two 250 GB SSDs to form a virtual 500 GB vdev. You could then create a ZFS mirror with that 500 GB GEOM concat and the 480 GB SSD. I would not suggest that you do this sort of thing a lot (because it increases complexity), but it's a good way to extract the best from the hardware that you have.

You have two choices to encrypt with ZFS. You can use a GEOM provider to do it independently from ZFS, or you can get ZFS to do it. I don't have experience with either, so will leave others to comment. [GEOM providers can be stacked, so there is no issue having a GEOM encrypted concat.]

Special vdevs contain the metadata (and optionally small files) for the pool that they serve. They add most value when placed on an SSD when you have a pool with (slow) spinning disks. So not particularly relevant to your needs.

There's lots of interesting stuff in this thread. The Lucas eBooks on ZFS are worth reading cover-to-cover.
 
50 people using the system 24x7 suggests to me that you need a highly robust system which is fault tolerant.

I don't understand your arguments to use SATA SSD for the database. NVMe is simply superior. Which ever media you use for the database, it will wear out faster, assuming its more heavily used. I don't believe that wear rates on NVMe SSDs are significantly different to SATA SSDs -- there's a lot of variation, and you get what you pay for.
I know that nvme is superior in many aspects but if i compare my current nvme and ssd - longevity and robustness looking better for SSD if i compare stats:
Samsung 970 EVO 500GB: Power consumptions: 5.7W read - 5.8 Write
MTBF - 1,5mill hours, TBW = 300TB

Samsung PM897 TLC/MZ7L3480HBLT-00A07: Power consumption: 2.1W read - 2.6 write,
MTBF - 2mill hours, TBW 2628TB

Which TBW is basically 9 times more, power consumption twice lower, yes - its 6 times slower on writing and reading - but for me this is acceptable.
Or im looking at the wrong stats to determine which drive to use for my database ?
But in general understanding - 300TB is way lower and much quicker to use it up compared to 2700TB. Both are TLC.
At the moment i dont know how much of the difference end user would see ssd vs nvme on data they will see inside web page.
So this parts are why im clinging my SSD to be a priority as this was my reason to get it.
GEOM is MBR right ?

My choice would probably be to stripe two mirrors (best performance, maximum shared headroom). But I would want to have the backup plan written before making the decision.
If i understand it correctly, this way my 2 nvmes would be combined into one pool of 1TB and other 2 nvmes would combine into one 1TB mirror(redundancy ) ?
So if this is the case - i could take 480GB ssd mirror it with 2way stripe of 250GB- what would happen if one of my 250GB ssd goes tits up ? i would need to change both of them or only 1 ?

Another question: if i understood it correctly - silvering only if you do raidz ?

Thank You.
 
How big is your database and how much IOPS you need for it?
At the moment i dont know how big it will be but im sure 500GB will be more than enough as i looked to raw data files in pdf - files varied from 800KB to few megs but it contained pics, shmicks etc. and majority of this data are not needed. excel file with 10k records - 1.5MB disk size so lets say i multiply it by a bit accounting extras etc so lets say 10k records - 10MB.... so it would be around 10-20GB and it would take them a while to get to this stage because to enter 10k records it takes a while even exporting excel to csv and uploading it and adding extras manually its going to take a lot of time and at this stage i will have extra drives ready.
 
so you are not talking about SQL database but for fileserver?

If you don't need much space i would suggest to create RAID10 (striped mirror vdev) you will lost 50% of the total raw space.
 
so you are not talking about SQL database but for fileserver?

If you don't need much space i would suggest to create RAID10 (striped mirror vdev) you will lost 50% of the total raw space.
It will be SQL - Postgresql to be more precise.

I will need more space than i have as i will try to get more people, friends etc.
 
But in general understanding - 300TB is way lower and much quicker to use it up compared to 2700TB. Both are TLC.
You get what you pay for. The PM897 has a superior specification for durability.
GEOM is MBR right ?
GEOM is a Modular Disk Transformation Framework, and has nothing to do with boot methods. It's one of the really significant features of FreeBSD storage management. [Edit: mirroring entire GPT disks with gmirror(8) is not recommended because GPT and gmirror both store metadata at the end of the disk -- the fix is to use gmirror on partitions, not whole disks.]
If i understand it correctly, this way my 2 nvmes would be combined into one pool of 1TB and other 2 nvmes would combine into one 1TB mirror(redundancy ) ?
There are multiple options. The one that most people would choose, and also suggested by VladiBG above, would be to create a single pool consisting of two striped mirrors, so 1TB total usable storage from four 500 GB NVMe SSDs. The other plausible option would be four 500 GB NVMe SSDs in RAIDZ1 configuration with 1500 GB total usable storage.
So if this is the case - i could take 480GB ssd mirror it with 2way stripe of 250GB- what would happen if one of my 250GB ssd goes tits up ? i would need to change both of them or only 1 ?
There is no stripe. You have one physical SSD of 480 GB. You make a GEOM concat of two 250 GB SSDs, creating a concatenated 500 GB vdev. You create a ZFS mirror from the physical SSD and the concat.
what would happen if one of my 250GB ssd goes tits up ? i would need to change both of them or only 1 ?
You would need to replace the dead SSD, re-create the concat, and re-silver the entire concat.
Another question: if i understood it correctly - silvering only if you do raidz ?
Re-silvering refers to the process of replacing a failed vdev by re-constructing the contents from redundant data held in the rest of the pool. It applies to all forms of ZFS pools that have some form of redundancy (RAIDZ, mirrors).

Edit: I'd install FreeBSD boot, root and swap on a standard ZFS mirror of two of your 110 GB SSDs. 60 GB should be plenty for the root, leaving ~50 GB for swap.

Edit 2: Devices used for mirrors should not have wildly dissimilar performance and durability. So you need to look at the specs of the 250 GB SSDs before making a decision to use them to construct a mirror with the PM897.
 
You get what you pay for. The PM897 has a superior specification for durability.

GEOM is a Modular Disk Transformation Framework, and has nothing to do with boot methods. It's one of the really significant features of FreeBSD storage management. [Edit: mirroring entire GPT disks with gmirror(8) is not recommended because GPT and gmirror both store metadata at the end of the disk -- the fix is to use gmirror on partitions, not whole disks.]

There are multiple options. The one that most people would choose, and also suggested by VladiBG above, would be to create a single pool consisting of two striped mirrors, so 1TB total usable storage from four 500 GB NVMe SSDs. The other plausible option would be four 500 GB NVMe SSDs in RAIDZ1 configuration with 1500 GB total usable storage.

There is no stripe. You have one physical SSD of 480 GB. You make a GEOM concat of two 250 GB SSDs, creating a concatenated 500 GB vdev. You create a ZFS mirror from the physical SSD and the concat.

You would need to replace the dead SSD, re-create the concat, and re-silver the entire concat.

Re-silvering refers to the process of replacing a failed vdev by re-constructing the contents from redundant data held in the rest of the pool. It applies to all forms of ZFS pools that have some form of redundancy (RAIDZ, mirrors).

Edit: I'd install FreeBSD boot, root and swap on a standard ZFS mirror of two of your 110 GB SSDs. 60 GB should be plenty for the root, leaving ~50 GB for swap.

Edit 2: Devices used for mirrors should not have wildly dissimilar performance and durability. So you need to look at the specs of the 250 GB SSDs before making a decision to use them to construct a mirror with the PM897.

Thanks, i sorted my way trough it... i did a bit differently.
I used FreeBSD installer to create 2 way mirror for my boot,os and swap with 2x250GB ssd`s ( encrypted ) then i used my Server SSD and + 2 more nvme`s to create 3 way mirror ( for extra redundancy just in case ) and last 2 x nvme`s for my other usage in 2 way mirror.
I think my post installation pools are not encrypted as there is no ".eli" at the end but im ok with it for now but with time i think ill gonna go trough it and maybe re do all system trough command line instead of using FreeBSD installer. Did not expect to be easy this way as i was reading a lot how to create pools etc. but it was few lines and thats about it but i think all the challenges are awaiting in the future when i need to expand or repair my pool.

Thanks again.
 
Back
Top