ZFS - Best partitioning scheme between SSD and caviar red

cchamberlain · Apr 7, 2013

I'm building a home NAS for storage and hosting. The more I learn about ZFS, the more I think I've proverbially shot myself in the foot on my initial partitioning scheme.

My drives -

Samsung 840 120GB SSD
WD Caviar Red 3TB (eventually will expand to 5 of these)

My initial plan was to use the SSD for all of the main mount points on UFS (boot, /, /usr, /var, etc.) and then just mount WD Red as a ZFS pool, and grow it as I acquire more of the WD Reds down the road. I have read quite a few threads and can't find anybody doing this so I'm thinking that it probably will not use ZFS' memory features well and I do want the fastest and best overall setup regardless if I have to reformat and start over.

It looks like the sector size of the Red drive is 512 with a stripe size of 4096. Does anybody have any suggestions for my situation? I'm now leaning towards maybe putting some of the partitions on the SSD (boot, swap, L2ARC cache), and putting the rest on the ZFS pool.

Please let me know if I can provide any more detail, I'd appreciate any suggestions on the partition scheme between the two drives as well as FS types (UFS vs. ZFS for each partition). Good partition offsets for my situation would be very helpful to know as well.

Thanks in advance!

priyadarshan · Apr 7, 2013

Hi,

I am also quite interested in such a scheme. In my case, I have a 120 GB SSD and already three 3TB Caviar Red. I would like to set up two of them as a mirrored pool, and the third as a hot spare.

Thank you!

cchamberlain · Apr 8, 2013

Glad to see I'm not the only person with this question.

I was looking at this thread which sounds sort of similar to what we're doing - http://forums.freebsd.org/showthread.php?t=38740

I am currently planning on going with @usdmatt's solution with bootcode and cache on the SSD:

I personally would just create a root pool on the SSD and a separate data pool (I don't know where the 3rd you mention comes from?) or do as suggested above - put bootcode on the SSD, a single disk pool on the HDD and use the remaining SSD space as a cache. Obviously the cache is always empty on boot but I don't think disk performance has much of an effect on boot time anyway, most of it is spent in BIOS, boot loader or device discovery.

I'm not sure whats optimal on the cache side but since boot I imagine would be 512 kB, that would leave nearly 120GB free for cache. Anyone know if all this cache would be used or if it would be smarter to stick some part of the file system on the SSD beside the cache.

Probably will go ahead with this but if anyone can ring in with experience that would be great.

throAU · Apr 8, 2013

I'd personally create a pool using the spinning disk as your storage and use the SSD as L2ARC.

It's far more important to cache data for performance than the boot files which you will typically read once every boot (how often do you reboot?).

Bear in mind that you won't be able to expand your ZFS pool by adding single drives if you want any sort of redundancy.

What are you trying to achieve, what is your intended workload? This will determine what trade-offs are "best" for your circumstances. Because every storage setup is a trade-off in one way or another.

cchamberlain · Apr 8, 2013

Thanks for the reply @throAU. That is starting to sound like the best bet.

So basically if I start off with one ZFS storage drive, I can expand the pool but since it wasn't initially setup with redundancy, the new expanded pool wouldn't have redundancy, correct? Would I be able to dump the filesystem to a separate computer, repartition and set the pools up with redundancy, and re-import the file system onto the new pool down the line when I get more drives?

Primary usage will be serving media over Samba to my home, but I have a Xeon e3-1245 processor and 16GB RAM so I will likely setup Apache and run usenet apps (SABnzbd, CouchPotato, SickBeard), Newznab and other applications along these lines.

vermaden · Apr 8, 2013

cchamberlain said:
Samsung 840 120GB SSD

WD Caviar Red 3TB (eventually will expand to 5 of these)

Better start with at least 2 of them (ZFS mirror), for 5 or 6 I would use RAIDZ2 (RAID6).

Use that SSD as L2ARC cache device.

You can use another 2 SSDs (ZFS mirror) for ZIL.

usdmatt · Apr 8, 2013

If you start with a single disk in your pool you can add a second to make a mirror. From this point onwards, the recommended action would be to continue adding mirrored pairs - i.e. go from 1 mirror to 2 mirrors, to 3 mirrors (effectively RAID10).

You will not be able to convert a single disk pool (or a mirror) into a raidz without destroying the pool and re-creating it.

Obviously you can start with 1 disk and just keep adding 1 more in a stripe setup but I would generally advise against a non-redundant config.

Remember that adding bootcode to the SSD and using whole disks for the pool makes managing disks in the pool slightly easier but also means you won't be able to boot if the SSD fails. There's a fair argument for adding bootcode to all the pool disks and using the SSD purely as cache (or cache and ZIL but you may not benefit that much from ZIL depending on your application)

kpa · Apr 8, 2013

If the SSD fails you can boot from an USB memory stick that has the same bootcode.

usdmatt · Apr 8, 2013

Just to add the 3TB RED disks are definately advanced format (4k) so you'll want to look up the posts on here about getting alignment right and using gnop to get the right ZFS setup.

I don't know when we're going to see a simple way to override the sector size for new pools, they've been discussing various methods on the mailing lists for ages now.

As with pretty much all disks, the REDs still claim to have a 512b sector to the OS although they also show a 4k stripe size (I don't think this is universal though as the devs would of already jumped on it as a simple way to identify AF drives).

cchamberlain · Apr 8, 2013

Thanks for all the great advice! I ended up just going for it last night and set up the pool with the intention that I will be doing it again soon (everything is not going to be set up optimal anyway the first time I try it, right).

I'm glad to know they have the 4k sector size, I was getting thrown off by the report it was showing me. Planning on ordering some more WD Reds today and will read up on gnop. I had planned to offset the sectors for 4k optimization but after hours of trying to destroy partitions and getting "Device busy" last night, I was just happy when I finally got it to partition zfs on there.

FYI to anyone having this issue - I was initially going off these instructions on setting up ZFS on GPT partition. They are dated for the current USB installer (so far as I can tell) since I could not find a way to get into Fixit. I tried entering sysinstall from the shell and navigating to it like that but none of the Fixit menu area would let me actually get into it. Instead, I used the updated instructions here. The author calls out the issue with the current installer in the first paragraph.

usdmatt said:
If you start with a single disk in your pool you can add a second to make a mirror. From this point onwards, the recommended action would be to continue adding mirrored pairs - i.e. go from 1 mirror to 2 mirrors, to 3 mirrors (effectively RAID10).

So this sounds to me like I should buy drives in even numbers while I'm adding mirrors, then wipe out everything if I make the jump to RAIDZ when I get 5 or 6 disks, right? Do I need to set up the first mirror before the file systems are created on the pool?

usdmatt · Apr 8, 2013

You can start with a single disk, create file systems, put data on it and then convert to a mirror without any problem.

Code:

# zpool create pool disk1 (single disk)
(you can start adding filesystems/data now)
# zpool attach pool disk1 disk2 (mirror)
# zpool add pool mirror disk3 disk4 (2 mirrors)
# zpool add pool mirror disk5 disk6 (3 mirrors)

*Make sure you learn the difference between the attach and add subcommands*
They get a lot of people into trouble. attach creates a mirror (or adds another disk to a mirror), and add adds a new vdev to the pool (i.e. add-ing disk2 instead of attach-ing it would stripe your data across the 2 disks rather than make a mirror, and you can't undo it).

You can even have a single mirror, add another disk (giving one mirror + one single disk), then later on make that single disk a mirror (giving 2 * mirror) and so on but this isn't really advisable. When you have mirrors + a standalone disk in the pool, you lose everything if that standalone disk fails.
Just for interest that would go something like this:

Code:

# zpool create pool disk1 (single disk)
# zpool attach pool disk1 disk2 (single mirror)
# zpool add pool disk3 (mirror + single)
# zpool attach pool disk3 disk4 (2 mirrors)
# zpool add pool disk5 (2 mirrors + single)
# zpool attach pool disk5 disk6 (3 mirrors)

I seem to say this about once every few days now but I follow the method used by @vermaden in the forum post below. Do not use sysinstall. Whenever I've tried to use bsdinstall it seems to give me problems when I drop back to the console after the install to sort the zpool.cache stuff. I find the method that works for me without any issue is to just go into the live cd and do the whole thing by hand:

http://forums.freebsd.org/showthread.php?t=31662

Last time I installed a 4k drive I started the first real partition at 1m (using the -b 1m option), and the rest aligned to 4k (with the -a 4k option). Don't know if there's a preferred method.

I too quite often run into device busy errors when messing with disks. I might be making this up (been a month or two) but I seem to remember the gpart commands complain about this but I can usually get around it by dding the start/end of the disk manually and then starting again.

cchamberlain · Apr 9, 2013

Thanks @usdmatt, this was exactly what I was looking for. Just ordered a second WD Red, should be here tomorrow and I'll stick it in as a mirror, then down the road when I get a third I will likely switch to RAIDZ for its space benefits.

I had a similar issue with bsdinstall, got through the step of deleting/creating the zfs partitions, then on committing it said that one of the partitions could not be deleted. I will definitely be following @vermaden's advice on the base install when I switch over to RAIDZ down the line.

Out of curiosity, do you leave any space free at the end of the disk due to ZFS requiring that future disks are greater than or equal to the current disk size, and have you run into any issues with not being able to add additional drives because the random variance of drive sizes. I see some people saying to not use the full size and other people just saying to let it use the whole disk.

gpw928 · Apr 9, 2013

Hi,

You are by no means alone. It's been something of a wait for FreeBSD to get SSD support, and the WD Reds have a lot to offer at the value end of spinning disks.

I have a setup very similar to yours, and hope that this thread develops well.

Some observations from reading a lot of Sun stuff on ZFS regarding the ZFS intent log (ZIL) are:

the ZIL never needs to be larger than 50% of main memory (so quite small);
the ZIL turns random writes into sequential writes (performance wise); and
the ZIL must be low latency storage compared to the tank (i.e. SSD);

My plan has been to put the ZFS cache and ZIL onto my Samsung 840 Pro. It's SATA 3 connected, so I'm hoping that there is enough bandwidth for both.

Also I have noticed that provided you keep the block size right (I'm using NFS) ZFS RAID1Z behaves like a stripe (i.e. very fast).

Cheers,

Terry_Kennedy · Apr 9, 2013

cchamberlain said:
My initial plan was to use the SSD for all of the main mount points on UFS (boot, /, /usr, /var, etc.) and then just mount WD Red as a ZFS pool, and grow it as I acquire more of the WD Reds down the road. I have read quite a few threads and can't find anybody doing this so I'm thinking that it probably will not use ZFS' memory features well and I do want the fastest and best overall setup regardless if I have to reformat and start over.

What are your goals for this NAS? One important factor is the percentage of writes vs. reads. If you expect to be doing a fair number of writes, you might investigate using the SSD for a ZIL device and using something else for the base operating system storage. Remember, SSD's have a finite capacity for writes and it may not be the best use of the drive for storing things that are only have a relatively short useful life, like most of the files in /var/log. You also don't need ultimate speed for operating system files - the user experience will be based on how fast your NAS can get data from the ZFS pool to the user (or vice versa).

And there's something else to consider regarding that last sentence - you'll only get 125Mbyte/sec (best case) on a Gigabit Ethernet, so if you aren't doing local processing such as media transcoding, your money may be better spent on slower disks with more capacity for the same price.

On my RAIDzilla II systems I'm using a pair of WD Blue notebook-class drives for the operating system (mirrored, UFS format), 16 x 2GB WD RE4 drives for the ZFS pool, and a 300GB (way oversize) PCIe SSD as a ZIL device. This config will read or write at > 600Mbyte/sec continuously, with burst writes above 4GB/sec (see JPEG here).

To answer your question about adding drives, you can add drives / vdevs to many ZFS configurations, but ZFS will not move pre-existing data to balance space between the existing and added devices. So, unless you will be adding lots of data, you'll wind up doing lots more I/O to the older vdev(s) than the new ones. To balance things you'd need to back up the pool data somewhere, re-create the pool, and restore. But that doesn't seem like it will work for you, as adding drives one at a time implies there isn't any place with enough space to back up the existing data. At pools of the size I'm using, it also gets challenging to move 13TB or so of data off so I can re-initialize the pool.

One point raised in a later reply was hot spares. There's no "auto" in ZFS autoreplace on FreeBSD - you'll need to manually tell ZFS to start using the hot spare. There was some discussion about adding this to devd(8), but I haven't kept track of where that went.

wblock@ · Apr 9, 2013

cchamberlain said:
Thanks @usdmatt, this was exactly what I was looking for. Just ordered a second WD Red, should be here tomorrow and I'll stick it in as a mirror, then down the road when I get a third I will likely switch to RAIDZ for its space benefits.

That will require a backup and reformat of the existing drives. ZFS can't morph from a mirror to RAIDZ1 on its own.

Out of curiosity, do you leave any space free at the end of the disk due to ZFS requiring that future disks are greater than or equal to the current disk size, and have you run into any issues with not being able to add additional drives because the random variance of drive sizes. I see some people saying to not use the full size and other people just saying to let it use the whole disk.

Later versions of ZFS are reported to leave unused space at the end of the drive for just that reason. Exactly which versions, and how much unused space, I have not found.

wblock@ · Apr 9, 2013

gpw928 said:
Hi,

You are by no means alone. It's been something of a wait for FreeBSD to get SSD support,

Please expand on that--do you mean SSD support in the installer, or TRIM support in ZFS, or something else?

Also I have noticed that provided you keep the block size right (I'm using NFS) ZFS RAID1Z behaves like a stripe (i.e. very fast).

My 3-disk RAIDZ1 seems to be about twice as fast as a single disk (180M/sec, AFAIR). It's enough, but not as fast as I'd hoped.

rusty · Apr 9, 2013

cchamberlain said:
Out of curiosity, do you leave any space free at the end of the disk due to ZFS requiring that future disks are greater than or equal to the current disk size, and have you run into any issues with not being able to add additional drives because the random variance of drive sizes. I see some people saying to not use the full size and other people just saying to let it use the whole disk.

Personally I like to partition, better safe than sorry.
It would be a minor annoyance with a 2 way mirror, anything beyond that would really irritating.

cchamberlain · Apr 9, 2013

Great responses, let me see if I can respond to everything in one go.

gpw928 said:
My plan has been to put the ZFS cache and ZIL onto my Samsung 840 Pro. It's SATA 3 connected, so I'm hoping that there is enough bandwidth for both.

I was thinking about doing this but read that having them both on the same SSD might encumber performance. In my case I have a Samsung 840 120GB (picked up a cheap one for this, have the pro in my desktop/laptop) which I think will be just about perfect L2ARC size to coincide with my 16GB RAM and 15TB of HDD end state (3TB x 5). As of right now I'm not planning on putting in multiple SSDs since I'm fairly limited with 6 SATA connections (using P8H77-I mini-ITX motherboard) - my goal here was to pack as much power into a mini-ITX form factor as possible. I have a single PCI-e slot on the board that could have been used as another SATA controller but since I cannot seem to get the onboard video to work, I had to resort to using a graphics card - still no idea why it won't work since my processor (E3-1245) has support for integrated video but that's off topic.

In terms of read vs. write, I'd like to favor read speed since the primary usage will be streaming to my LAN. Also, I went with the cheapest Samsung SSD I could find so as long as my data doesn't get corrupted (my understanding is if L2ARC dies everything will just run a little slower), I don't mind throwing the SSD in the garbage in 6 months if it comes down to that.

Terry_Kennedy said:
To answer your question about adding drives, you can add drives / vdevs to many ZFS configurations, but ZFS will not move pre-existing data to balance space between the existing and added devices. So, unless you will be adding lots of data, you'll wind up doing lots more I/O to the older vdev(s) than the new ones. To balance things you'd need to back up the pool data somewhere, re-create the pool, and restore. But that doesn't seem like it will work for you, as adding drives one at a time implies there isn't any place with enough space to back up the existing data. At pools of the size I'm using, it also gets challenging to move 13TB or so of data off so I can re-initialize the pool.

On that question, I was mostly just curious if it was possible to add mirrors without reformatting. I don't have any data on the server yet, right now I'm still tuning and don't mind reformatting if need be. I have a beast of a desktop with a couple of WD Black 2TB alongside a 840 Pro which holds all my data at the moment so I have time to get it right. Given that the zpool hasn't had all of my data moved to it (pretty much just the standard root file system currently), would it be beneficial to start over with the mirror I'm going to add today or should I just add the mirror on?

wblock@ said:
That will require a backup and reformat of the existing drives. ZFS can't morph from a mirror to RAIDZ1 on its own.

Yes, this was my understanding, I'll plan on backing up the data to my desktop when I switch over to RAIDZ1. Its always better to be verbose though.

wblock@ said:
Later versions of ZFS are reported to leave unused space at the end of the drive for just that reason. Exactly which versions, and how much unused space, I have not found.

Thanks for clearing that up!

Will be moving forward with adding in the mirror tonight.

Terry_Kennedy · Apr 10, 2013

cchamberlain said:
Out of curiosity, do you leave any space free at the end of the disk due to ZFS requiring that future disks are greater than or equal to the current disk size, and have you run into any issues with not being able to add additional drives because the random variance of drive sizes. I see some people saying to not use the full size and other people just saying to let it use the whole disk.

In many cases this is an over-emphasized concern. First, I don't know of any current drive manufacturer which will warranty-replace an "xGB" drive with one with fewer sectors. That means that any RMA replacements you get for the next 3 (or 5, etc.) years will be the same size or bigger. I've had experiences with both Seagate and WD where they're replaced a drive under warranty with a larger, newer model because they no longer stocked the older model. [For that matter, I've had the same thing with my PCIe SSD's - the manufacturer replaced my 256GB ones with 320GB ones because they no longer had any of the older ones to meet warranty requirements.]

The second concern is that drives in the "far" future might have slightly different capacities, and might be a few blocks smaller than previous models. I was surprised that I've never run into this - in fact, I recently replaced some Seagate Cheetah 15K.4 drives with Cheetah 10K.6 drives, which are from a completely different family and 2 generations apart, and they both have exactly 286749488 sectors.

One thing to be aware of is that sizes may vary between drive manufacturers. Custom firmware for OEM's like Dell, HP, etc. often reports a different number of sectors than the generic version of the same drive model. This is so the OEM's can customize the capacity so all of their xGB drives have the same number of sectors, regardless of manufacturer. That allows them to ship out any brand of, say, 300GB 7200 RPM SATA drive as a spare part regardless of whether they were made by Seagate, WD, Samsung, etc. without needing to worry about customers having problems re-adding them to an array.

In closing, if you haven't created a pool yet, you might want to consider sizing down the drive capacity a little "just in case". But if you have an existing pool utilizing the entire capacity of the drives, don't panic - it will probably make no difference.

throAU · Apr 10, 2013

If you're storing media files (or other large files that are streamed), and accessing them via 1 GbE, the benefit of a ZIL or L2ARC will probably be limited unless you have a large number of users (to randomize your IO), as your spinning disks will saturate 1 GbE already.

However, I'd wager L2ARC will be more useful (workload more read-biased) and won't require an SSD mirror, as an SSD failure won't impact data integrity.

From what I've read, if you're wanting an SSD based ZIL, you really should have multiple SSDs set up as a mirror, otherwise a failure in an SSD can potentially cause pool corruption, etc.

Terry_Kennedy · Apr 10, 2013

throAU said:
From what I've read, if you're wanting an SSD based ZIL, you really should have multiple SSDs set up as a mirror, otherwise a failure in an SSD can potentially cause pool corruption, etc.

I have confirmed that this was fixed as of the ZFS v28 import (and subsequent changes over the next month or so after that). I had a ZFS v15 pool where the SSD-based ZIL failed (PCIe SSD which used flash chips on SODIMM-like modules, where a connector problem was common). The pool could be read, but any attempt to write it would cause the system to panic immediately.

Recently (a month or two ago) I built another RAIDzilla II on a bet that I couldn't do it for $3000 or less (not counting the 16 2TB drives). It ended up costing $3001.70. Anyway, that 'zilla had another one of those same SSD's, and experienced the same problem with bad flash connectors. However, this system was using 8-STABLE with the latest ZFS, and it was quite easy for me to remove the ZIL from the pool and install a replacement PCIe SSD (which, fortunately, no longer uses connectors) and add it to the pool.

Yes, if you lose the ZIL you may have some uncommitted writes. However, even with mirrored SSD's you can have this happen unless the SSD's DRAM memory is backed up by either a battery or a supercap. And there's also the issue of the "real" disk drives and controller correctly reporting synchronous command completion properly. If the drive or controller says the data was written, the OS (any OS) has no way of knowing otherwise.

I use a non-mirrored PCIe SSD ZIL with supercaps, a 3Ware 9650SE-16ML with battery backup, and WD RE4 'enterprise' drives. All four 'zillas (128TB total) are on a UPS which provides at least 4 hours of runtime if there's a power failure. Based on that, I believe I have taken reasonable steps against my pools getting irrecoverably damaged. Any data being written to the pool can be re-created, and I've never had ZFS lose or corrupt data once it has been committed to the pool.

I also do regular backups to LTO-4 media which is stored offsite, as well as replicating the data on other 'zillas at an offsite location via a dual 1Gbit/sec Ethernet link.

I could split the PCIe SSD and do mirroring - the underlying architecture is a LSI SAS2004 controller running the Integrated RAID firmware with 4 independent flash controllers and chips. So I can run it as a stripe (which is how I have it configured), or half-capacity as a mirror. At some point providing redundancy in the ZIL actually reduces overall performance - the ZFS pool without the SSD can do 600 to 700 Mbyte/sec reads or writes.

By the way, resilver performance when replacing a fast PCIe ZIL is amazing. This is actual output from the resilver I did after doing a replace on the ZIL:

Code:

(0:29) rz3:/sysprog/terry# zpool status
  pool: data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Feb 27 14:23:19 2013
        5.36T scanned out of 17.8T at 3.56G/s, 0h59m to go
        0 resilvered, 30.18% done

cchamberlain · Apr 10, 2013

More good info, thanks guys.

So I'm messing around with the mirror and having some trouble. If I understand correct, I need to first partition the hard drive the same way as my other drive. On my first drive, I have the following partitions (please ring in if any of these look unnecessary or non optimal sizes) -

Code:

$ gpart show ada1
=>        34  5860533101  ada1  GPT  (2.7T)
          34           6        - free -  (3.0k)
          40         128     1  freebsd-boot  (64k)
         168    16777216     2  freebsd-swap  (8.0G)
    16777384  5843755744     3  freebsd-zfs  (2.7T)
  5860533128           7        - free -  (3.5k)

I setup the second drive in the same fashion with gpart -

Code:

$ gpart show ada2
=>        34  5860533101  ada2  GPT  (2.7T)
          34           6        - free -  (3.0k)
          40         128     1  freebsd-boot  (64k)
         168    16777216     2  freebsd-swap  (8.0G)
    16777384  5843755744     3  freebsd-zfs  (2.7T)
  5860533128           7        - free -  (3.5k)

I tried writing the bootcode on second drive, then glabel it as disk1 using glabel label -v disk1 /dev/ada2 and got an instant "Corrupt or Invalid GPT detected. GPT rejected -- may not be recoverable". I then deleted the slices and destroyed the partition and tried again, this time I didn't do the bootcode, and I did it from Live CD and got the same issue. Any idea what I'm doing wrong?

My understanding is that I need to have the same slices on the partition and a label to use with the zpool attach command. Here is some info on the zpool in case that helps -

Code:

$ zpool status
  pool: zroot
 state: ONLINE
  scan: none requested
config:

        NAME         STATE     READ WRITE CKSUM
        zroot        ONLINE       0     0     0
          gpt/disk0  ONLINE       0     0     0
        cache
          ada0       ONLINE       0     0     0

errors: No known data errors

Code:

$ zpool list
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
zroot  2.72T  2.53G  2.72T     0%  1.00x  ONLINE  -

cchamberlain · Apr 10, 2013

Terry_Kennedy said:
Recently (a month or two ago) I built another RAIDzilla II on a bet that I couldn't do it for $3000 or less (not counting the 16 2TB drives). It ended up costing $3001.70. Anyway, that 'zilla had another one of those same SSD's, and experienced the same problem with bad flash connectors. However, this system was using 8-STABLE with the latest ZFS, and it was quite easy for me to remove the ZIL from the pool and install a replacement PCIe SSD (which, fortunately, no longer uses connectors) and add it to the pool.

Wow, all I can say about that is holy crap. May I ask what your primary usage is? Website?

By PCI-E SSD, is that an mSATA or something else?

kpa · Apr 10, 2013

Do not use glabel(8) for labeling disks or partitions on GPT partitioned disks. GPT has its own labeling system that is superior in many ways. Also labeling the whole disks does not make sense if you want to identify partitions by easy names.

After creating the partitions as you did above:

# gpart modify -l swap1 -i 2 ada1
# gpart modify -l swap2 -i 2 ada2

# gpart modify -l disk1 -i 3 ada1
# gpart modify -l disk2 -i 3 ada2

Do these to force GEOM "retasting" to make the labels visible in /dev/gpt immediately:

# true >/dev/ada1
# true >/dev/ada2

You can see the labels in the output of

# gpart show -l

Then you can use the names gpt/swap1 gpt/swap2 for building a gmirror(8) for swap.

# gmirror label myswap gpt/swap1 gpt/swap2

And build the ZFS pool using the names gpt/disk1 and gpt/disk2

# zpool create mypool mirror gpt/disk1 gpt/disk2

The bootcode is written with these, on both disks:

# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada2

Terry_Kennedy · Apr 10, 2013

throAU said:
If you're storing media files (or other large files that are streamed), and accessing them via 1 GbE, the benefit of a ZIL or L2ARC will probably be limited unless you have a large number of users (to randomize your IO), as your spinning disks will saturate 1 GbE already.

Very true. The systems I build do a large amount of local processing, so it's worth getting the highest possible performance from the pool. Serving files over a 1Gbit/sec LAN will not tax most ZFS configurations unless there is a lot of thrashing going on (as you point out). In fact, most of the better 2TB+ drives (7200 RPM, etc.) can probably saturate a 1GbE link with only a single drive, and a 2-drive stripeset definitely will.

Where things get interesting is way at the high end of the scale. Here is a benchmarks/iozone graph on a pool with neither a ZIL nor a L2ARC device. FreeBSD 8.4-PRERELEASE. I can't say a lot about the hardware (work system vs. my hobby 'zillas). Peak performance of 7GB/sec while in the processor cache, then 4.5GB/sec from main memory, and finally dropping "down" to 1.5GB/sec when it actually has to hit the disks in the pool. The main problem I have is how long it can take to do a maximum iozone run, despite the fast storage in use.