Mass storage server [ZFS - FreeBSD]

Hey guys,

I am about to set up a mass storage server for various audio/video streaming offers including file downloads. Before this, I worked with a lot of smaller servers (DL-120 with 8 TB storage and XFS file system) and so I would like to know if anyone can offer some experience as to what I have to watch out for with a bigger server.

For the hardware I was thinking of:

  • Supermicro 846TQ Model
  • 2x Intel Quad-Core Xeon E5620
  • 64 GB DDR3 RAM
  • 24x 2TB SATA2

I would also use one more 120 GB SSD in order to outsource the operating system (FreeBSD 9) to this drive. As a file system I was thinking about using ZFS. The 24 hard drives would be configured as a RAID-Z3 to offer some fail-safety as the 42 TB are quite a lot of data which should be available 99% percent of the time. According to some online sources the software RAID of the ZFS should work faster and better than most hardware RAID controllers. I would therefore not use a hardware RAID controller and let the system handle it.

As a web server I think about using the latest stable release of NGINX. In this case with 16 threads including async I/O and the MP4-streaming plugin and PHP5-FPM.

Now the following questions arise: how will all this work out in this configuration? On average the bandwidth in use will be 4-5 Gbit at peak times. There will be about 50 to 100 file requests per second. The serving of the files will be done completely over NGINX. My software is PHP based and will only refer via the HEADER to the right file including limits, etc. The most load average will therefore be produced by NGINX at the random-read of the data. How can this be optimized? What would I have to watch out for? Do I have the option of a cache for ZFS with a SSD in addition to the RAM? How much load would be taken off the system if this cache was in use?

Thanks in advance!
 
For the best I/O throughput, don't use a single RAID-Z3 vdev across all 24 drives.

If you need the most I/O: 12x mirror vdevs
If you need more space: 4x 6-drive RAID-Z2 vdevs
If you need more redundancy: 3x 8-drive RAID-Z3 vdevs
If you need the most space: 2x 12-drive RAID-Z3 vdevs

If you have a mix of large file storage and streaming throughput needs, consider doing multiple pools. For example, one pool of 12 drives using mirrors (fast), and another pool of 12 drives in 2x RAID-Z2 vdevs (bulk).

Using SSDs for the OS (mirrored) and L2ARC cache is good.
 
If you need the most space: 2x 12-drive RAID-Z2 vdevs are good enough, unless you're paranoid. If it had been 4 TB drives, I would seriously consider 3x 8-drive RAID-Z2 vdevs.

Backblaze, the online cloud backup provider use 3x 15-drive RAID 6 in their storage pods, they've never had an issue were the second parity drive went offline during rebuild. It's not that its unlikely to ever happen. But if you need best possible data protection and run RAID-Z3, you will still have the possibilty to seal heal data blocks if a data corruption is detected while resilvering with two drives offline.
 
shady said:
For the hardware I was thinking of:

  • Supermicro 846TQ Model
  • 2x Intel Quad-Core Xeon E5620
  • 64 GB DDR3 RAM
  • 24x 2TB SATA2
I've been running something similar for a little over 3 years now. Link (draft article, not yet published)

I would also use one more 120 GB SSD in order to outsource the operating system (FreeBSD 9) to this drive.
A SSD is overkill for the operating system if your data is stored elsewhere. I'm using a gmirror(8) pair of WD Black notebook drives on the RAIDzilla II.

According to some online sources the software RAID of the ZFS should work faster and better than most hardware RAID controllers. I would therefore not use a hardware RAID controller and let the system handle it.
Your choice of "dumb" controllers that give you 24 ports on a card without expanders is somewhat limited. The RAIDzilla II uses a 3Ware (LSI Logic) 9650 with BBU which exports each drive to the OS as an individual volume (note - not a "raw" drive), which lets the controller do its caching magic. Those drives then go into a ZFS pool (see my link above for more info). Putting all 16 drives on one controller saves slots and also means the server only needs 1 BBU card (vs. 2 in the original 'zilla design).

Now the following questions arise: how will all this work out in this configuration? On average the bandwidth in use will be 4-5 Gbit at peak times. There will be about 50 to 100 file requests per second.
My servers will do 500 MByte/sec continuously (days) for local applications, with burst write speed well over 1 GByte/sec. See graphic. Note that the read speed is without any SSD acceleration (the SSD is only used as a ZIL).

How will you be getting more than 1 GBit/sec out of the box? Will you be using 10G Ethernet? Right now I'm using the onboard Gigabit controllers, but I expect to do some testing within the next month using 10G Intel X540-T1 adapters.

Be careful with your ZFS options. I found that enabling deduplication slowed scrubs / resilvers by an order of magnitude. With deduplication off, scrubs run at over 650 MByte/sec on my servers.
 
shady said:
Do I have the option of a cache for ZFS with a SSD in addition to the RAM? How much load would be taken off the system if this cache was in use?

You will get a significant performance boost. The below graph is from a very busy web server. I usually get 60-70 % hit ratio. The system was rebooted on the 10th of May. Look how quickly L2ARC catches up.

zfs_stats_l2efficiency-week.png
 
Do you have any other redundancy besides RAID? I mean, if your PSU/memory/motherboard breaks, your RAID level is moot.
 
kgatan said:
Do you have any other redundancy besides RAID? I mean, if your PSU/memory/motherboard breaks, your RAID level is moot.

I perform daily full incremental snapshots on a different server. (standby)
 
I guess the question was if you use for example ECC memory. Without ECC memory your system isn't very robust against silent data corruption. You could be snapshotting corrupt data to your backup server with everything appearing to be ok as far as ZFS checksumming would make it appear.
 
Back
Top