BSD-based fw/router: ZFS on SSD RAID10 efficiency

Hi FreeBSD Gurus!

In case using bsd-based bare metal rack server as a gate fw/router:

1.
How effectively would be using ZFS (in comparison to GPT, etc) file system in hardware RAID10 (4 of SSD TLC/MLC drives) ?

2.
Is separating of /var, /tmp (also may be /usr) to another disk's logical volume (like we doing in case of hdd drives) not impact on overall IO performance?

3.
How to tuning FreeBSD to decrease writing cycles on the hardware SSD RAID10 ?

(mostly because this is network-oriented appliance, we need to writing a lot of logs on local disks in addition to sending logs to remote server).
Note, the RAID10 have a backup battery and of course powered by online-interactive UPS.

Thank You for detailed suggestions! Have a nice day! ;)
 
You should never combine hardware RAID with ZFS because you get two completely independent smart algorithms (RAID controller and ZFS) defeating each other.

If you must use hardware RAID, use UFS.

For a dedicated router (and not too much else), the FreeBSD installation should easily fit into 50GB (plus extra for swap, plus additional space for logs). 50GB would leave ample space for snapshots to be left hanging around as a security blanket when doing upgrades.

If you are using a FreeBSD system as a router, the disk I/O load generated by the operating system itself (with adequate memory) will be trivial. The only appreciable disk I/O loads will be from the writing logs.

If you are sending logs in real time to a log server, then the I/O bottleneck will very likely be the network link to log server, not the local disk traffic.

I would create a separate file system (and pool, if using ZFS) to isolate the logs, just so full logs could not compromise the operating system.

ZFS has so many advantages I would use it if your RAID controller permits JBOD (and you are not memory constrained). But UFS would also do the job perfectly well.

I'll leave others to comment on tuning to reduce I/O cycles, as I don't have any experience doing that.
 
You should never combine hardware RAID with ZFS because you get two completely independent smart algorithms (RAID controller and ZFS) defeating each other.

If you must use hardware RAID, use UFS.
Thank You for suggestions.

Why You so sure that ZFS would be defeating the hardware exactly RAID10 (or even RAID0)?
IMHO in exactly this case (4 SAS SSD, 2 backplains with 2 SSD on each backplain/channel) from FreeBSD’s disk driver's point of view, they operate with just fastest disks. (Of course, this “fastest” speed cost a more than 50% of disk overall capacity.)
Hardware RAID controller (in case EAID10) only making parallel read/write to two of RAID1s.

You would be absolutely right about defeating in case when for example hardware RAID5 or RAID6 used at the same time as ZFS used.
(More of this possible to read on TrueNAS Cord Hardware Guide https://www.truenas.com/docs/core/gettingstarted/corehardwareguide/, and in TrueNAS community forum https://www.truenas.com/community/threads/freenas-hardware-guide-up-to-date.80891/)

For a dedicated router (and not too much else), the FreeBSD installation should easily fit into 50GB (plus extra for swap, plus additional space for logs). 50GB would leave ample space for snapshots to be left hanging around as a security blanket when doing upgrades.

If you are using a FreeBSD system as a router, the disk I/O load generated by the operating system itself (with adequate memory) will be trivial. The only appreciable disk I/O loads will be from the writing logs.

If you are sending logs in real time to a log server, then the I/O bottleneck will very likely be the network link to log server, not the local disk traffic.
Totally agree with all of this.

I would create a separate file system (and pool, if using ZFS) to isolate the logs, just so full logs could not compromise the operating system.
What exactly You mean when using “compromise” term?
What the difference in IOps in case of SSD using, is this the same zpool or another for /var, /tmp ?

ZFS has so many advantages I would use it if your RAID controller permits JBOD (and you are not memory constrained). But UFS would also do the job perfectly well.

I'll leave others to comment on tuning to reduce I/O cycles, as I don't have any experience doing that.
 
Why You so sure that ZFS would be defeating the hardware exactly RAID10 (or even RAID0)?
There is a variety of reasons that stacking one RAID layer on top of another RAID layer is not a good idea, unless you build them to be aware of each other. For a simple RAID-1 or RAID-0, some of the reasons don't apply. But the single most important reason is that a RAID implementation at the file system layer can be faster (or more efficient, same thing) when doing a rebuild. Why? When rebuild is required, a standalone (hardware) RAID system needs to rebuild all the data, every byte on the (virtual) disk. A file-system RAID only needs to rebuild data that is actually used to store files. If the file system is say 50% full, rebuild will (a) finish twice as fast, and (b) only read half as much data. That means it is about twice less vulnerable to a an additional IO error occurring or being found during the rebuild. And this immediately improves the durability of the data by a factor of 2. Even better: If the file system layer has internal redundancy (like copies=2 has been configured, or metadata like superblocks are internally replicated), it can overcome RAID reconstruction errors.

There is a few other things that help too, but to me this is the big one.
 
the FreeBSD installation should easily fit into 50GB
1GB is sufficient for the base filesystem and the kernel.

I would create a separate file system (and pool, if using ZFS) to isolate the logs, just so full logs could not compromise the operating system.
I would recommend UFS for a router just to keep things simple. A ZFS may be underutilized on a gateway.

we need to writing a lot of logs on local disks in addition to sending logs to remote server
One source of truth is easier to manage rather than two or more. However, I would recommend saving a single/few condensed logs locally if you still insist on keeping them in both places. I would also recommend tunneling your syslog traffic with SSH to protect their trip to the loghost.
 
Why You so sure that ZFS would be defeating the hardware exactly RAID10 (or even RAID0)?
The approach is one of risk minimisation. Would you plug one hardware RAID controller into another hardware RAID controller and expect it to work? You have two "black boxs" both doing their own thing making assumptions about optimal sector size, write clustering, and striping. If you have an intimate understanding of how each black box behaves, then I expect things might be made to work.

But, if you put all the intelligence in a single box (or, in your case, ZFS) you are guaranteed an optimal outcome. No loss of performance. No risk of confusion. Complete understanding of the actual hardware. Why would you not go down the no-risk path?

If your RAID controller is not capable of JBOD, and you must use its RAID functions, then it's worth canvassing the issue of whether it makes sense to use ZFS. But I would not go there unless you have to.
What exactly You mean when using “compromise” world?
In the case of a router writing logs, the major risk is filling up the file system containing the logs. When this happens, processes writing into that file system will have their activity compromised, and will probably hang. So you want the file system containing the logs isolated from all other activity. With USF, this means a separate file system. With ZFS this means a separate file system in a separate (not zroot) pool -- because all file systems in a ZFS pool share a common pool of spare available disk space -- and filling one file system fills them all.
What the difference in IOps in case of SSD using, is this the same zpool or another for /var, /tmp ?
I would not expect any difference in IOPS.
 
In my experience, consumer-level hardware RAID is junk. It will actually introduce corruption, and may never successfully rebuild a volume, probably for the reasons Ralphbzs lists.
 
Note that compared to UFS, ZFS would do many more writes due to the way it works (merkle hash tree). On the other hand most hardware RAIDs usually suck. If I was building such a system I'd use 2x2 zfs mirrors (equivalent to RAID10) and turn off atime off (to reduce access time updates).
 
If you’re looking to maximize time to failure with your four disk config, you could put them into a mirrored configuration with two hot spares. You’ll still have redundancy, and when the first drive pops, you’ll have a drive with ~0 writes to it ready to go — and still have one in reserve. I would guess at that point you’re getting close to expecting something else to fail first. (Fan or power supply in my experience.)
 
In my experience, consumer-level hardware RAID is junk. It will actually introduce corruption, and may never successfully rebuild a volume, probably for the reasons Ralphbzs lists.
In most cases You are right.

Sorry, I not indicate in first message that we talk about IBM rack server with their MegaRAID/LSI.
 
Note that compared to UFS, ZFS would do many more writes due to the way it works (merkle hash tree).
Thank You for suggestions.

Is this mean that practically ZFS file system is not so good for building small (4-6-8 disks) disk massive with SSD (MLC/TLC), and better to stick on traditional HDD ?

(The price on small (60-128Gb) consumer-grade SATA SSD (MLC/TLC) on Micron/Samsung chips are $7-10.
For this price is possible just replacing SDDs EACH YEAR. Just $40+shipping / year on ONE server.
Ok, we lost wide of management that SAS provide, BUT OVERALL SPEED of disk subsystem RAPIDLY INCREASED.
Looks reasonable?)

Need to note we discuss here RACK server for networking operations, not anything linked to DB.

On the other hand most hardware RAIDs usually suck. If I was building such a system I'd use 2x2 zfs mirrors (equivalent to RAID10) and turn off atime off (to reduce access time updates).
Sorry my stupidness, I still no understand WHY this Your scheme faster/safety than hardware RAID10 (2 channels, RAID1 on each channel, both of this RAID1 included to whole RAID0, 4 SSD total) and two zpools (zroot - for whole system, zlog - for /var, /tmp…) on this one (hardware formed) logical disk?
 
I see no problem running ZFS on top of the LSI RAID, but it has only disadvantages compared to running ZFS on top of the LSI controller in JBOD mode.

I don't know whether the battery on the controller still does its job on JBOD mode. But it is probably pointless if your SSDs don't have power loss protection.

I would create a bunch of ZFS' to keep data filling up one filesystem from affecting another.
 
Is this mean that practically ZFS file system is not so good for building small (4-6-8 disks) disk massive with SSD (MLC/TLC), and better to stick on traditional HDD ?
It depends. It offers enough very attractive that it is a better choice in most cases. With SSD prices coming down it probably doesn't matter much if you have to replace them in N years instead of N+M years. On the flip side you will get far superior speeds and no seek latency and much less power use. On the con side, if you are doing lots and lots of writes, they will wear out faster. If you can, estimate if with the amount of writes you do on average per day to check how long they will last. And if you want the fastest possible throughput, FFS will behave better than ZFS but this shouldn't be an issue for most people - unless you are serving gigabytes of data per second!
Sorry my stupidness, I still no understand WHY this Your scheme faster/safety than hardware RAID10 (2 channels, RAID1 on each channel, both of this RAID1 included to whole RAID0, 4 SSD total) and two zpools (zroot - for whole system, zlog - for /var, /tmp…) on this one (hardware formed) logical disk?
There is just one zpool, composed of two mirror "devices" (I forget the zfs nomenclature for it). You can then divide this pool in various zfs filesystems the way you want. The issue with hardware RAID is that if you move to a controller from a different vendor, it may or may not work (may be things have improved but this was the case in 2005 => why I switched to ZFS then!).
 
I see no problem running ZFS on top of the LSI RAID, but it has only disadvantages compared to running ZFS on top of the LSI controller in JBOD mode.
Thank You for answering.

Let’s note, that IN THIS PARTICULAR CASE we discuss only RAID10 scheme where 2 SSD forming RAID1 on each channel, and both of RAID1s forming RAID0. And on top of this RAID0 would be created 2 zpools (zroot - for system, zlog - for /var, /log,…).

What exactly “disadvantage” You mean?

I don't know whether the battery on the controller still does its job on JBOD mode. But it is probably pointless if your SSDs don't have power loss protection.
From my knowledge on IBM MegaRAID, LSI battery constantly keep cache, independent on RAID or IT mode…

I would create a bunch of ZFS' to keep data filling up one filesystem from affecting another.
 
Not to stray too far from your ZFS question but I will relay my UFS RAID10 findings.

I could not find a single advantage to RAID10. I thought by striping my RAID1 I would get double the speed.
No I saw no speed advantage at all and more point of failure.

I used my EFI version for RAID10 experiments.

Code:
root@x9srl:/home/firewall # gmirror status
      Name    Status  Components
mirror/gm0  COMPLETE  ada0 (ACTIVE)
                      ada1 (ACTIVE)
root@x9srl:/home/firewall # gpart show /dev/mirror/gm0
=>      40  31276976  mirror/gm0  GPT  (15G)
        40    409600           1  efi  (200M)
    409640  30867376           2  freebsd-ufs  (15G)

I am using very old Industrial SSD 16GB in size. SATA2 drives in a Supermicro 2 bay tray.
Code:
root@x9srl:/home/firewall # camcontrol devlist
<FiD 2.5 SATA10000 081210>         at scbus0 target 0 lun 0 (pass0,ada0)
<FiD 2.5 SATA10000 081210>         at scbus1 target 0 lun 0 (pass1,ada1)
<AHCI SGPIO Enclosure 2.00 0001>   at scbus6 target 0 lun 0 (ses0,pass2)
Code:
ada0: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 512bytes)
ada1: <FiD 2.5 SATA10000 081210> ATA8-ACS SATA 2.x device

Innodisk SLC drives will last along time...
 
Now I will give my opinion which starts a big fight.

MegaRAID is unwanted. We all flash to IT mode. The LSI batteries are too expensive.

So buy drives with PLP if you care or invest in a UPS.

Why do you need a LSI controller in a router. They go in storage machines.

we need to writing a lot of logs on local disks
For that a dedicated small drive is what I would consider.
 
So buy drives with PLP if you care or invest in a UPS.
I'll ruminate on that :)

I have worked in some quite large enterprises. All had big/many UPSs.

Three of those large sites had power failures. Two were due to a faulty UPS, and one was due to a faulty switch (that selected the power source).

Uninterruptible power supplies can be interrupted!

If you don't want to lose data, buy enterprise class SSDs with full data path, and power loss, protection.
 
Let’s note, that IN THIS PARTICULAR CASE we discuss only RAID10 scheme where 2 SSD forming RAID1 on each channel, and both of RAID1s forming RAID0. And on top of this RAID0 would be created 2 zpools (zroot - for system, zlog - for /var, /log,…).

What exactly “disadvantage” You mean?
I'm wondering what your motivation is to deploy the RAID controller in anything but JBOD mode.

Your drives, and busses will be the same, so I don't see any additional bandwidth being available. Certainly the RAID controller can shoulder some of the I/O interrupt and processing load. Are CPU cycles so scarce that you can't afford to let ZFS do all the work? If they are, then your router functions are at risk of running slow.

Longer term, today's very controlled circumstances with your identical SSDs may change as hardware fails and gets replaced. Do you want to re-consider your design every time an SSD fails?

On the matter of tiny FreeBSD instances... I suggest you have enough disk space in the root for long term snapshots, and to run utilities like tcpdump, and wireshark. Packet traces can get real big, real quick. If you are building appliances for profit, and every dollar counts, then skimping the disk may make sense. But for a one-off build, the last thing you want to discover is that your operating system needs more disk space to hold the packages required for an upgrade, or the packet traces for diagnosing the latest "routing problem". Disk space is the cheapest component of your system. Back to risk management. There is no reason to skimp.

On the choice of UFS or ZFS... UFS file systems now allow snapshots on filesystems using journaled soft updates. That's a major improvement, because you can perform background dumps on a live journaled filesystem. But the work to allow on-line fsck on such file systems is still in progress (scheduled for this year). So you still have to pause the boot to run fsck on a UFS file system with soft updates journaling after a crash. Never the less, UFS now allows soft update journaling and snapshots. That's a good thing, and matches one of the major advantages of ZFS. It makes UFS a genuine contender, especially if you are set on using the RAID controller.

However, I would still use ZFS because of its other numerous advantages. Just one example is if your mirror is not big enough, just attach bigger "disks" to the mirror and re-silver them in what ever order is most prudent and operationally convenient. Offline and detach the old disks. Problem solved. No outage!
 
I agree enterprise or industrial drives preferred.
Maybe put /var and any /usr/ logs on it. OS can run from SD card....

I tried /var on memdisk to minimize writes. Did not work well. Need fine tuning directories under /var for memdisk.
Package keeps stuff that needs persistent /var
 
Back
Top