New fileserver: ZFS pool layout

Not currently. The "block pointer rewrite" feature is what's needed for this (online defragmenting) and a host of other features (migrating between vdev types, removing vdevs from the pool, etc).

It's under development and people are actively working on it, but it's not yet available as part of any ZFS release.
 
Update on server build...

Okay, so the hardware is in place, and it's time to play with the setup!
I went for a new server instead of upgrading the old, and found a case which allows me to have 15 3.5" drives in it, all accessible from the front using hot-plug docking bays.

I went for a zroot of 2x 1.5TB drives (mirror) for OS+Jails, and 2x RaidZ2(6x1.5TB) in DataStore for usergenerated data.

Bay1 is on the mainboards SATA3 controller, Bay2+3 is on the mainboards ICH10R controller, and Bay4+5 is on the Highpoint RocketRaid 2320 controller (I tried setting the drives in jbod mode but that didn't work too well. Ent up with just plugging them in and let it do its thing automatically w/o setting up the drives in any way in the controllers BIOS)

Drive Partitioning
# gpart show ada1
Code:
=>        34  2930277101  ada1  GPT  (1.4T)
          34         128     1  freebsd-boot  (64K)
         162        1886        - free -  (943K)
        2048  2930275087     2  freebsd-zfs  (1.4T)

Pool Creation
# zpool create DataStore raidz2 /dev/gpt/Bay2.*-zfsdata /dev/gpt/Bay3.*-zfsdata
# zpool add DataStore raidz2 /dev/gpt/Bay4.*-zfsdata /dev/gpt/Bay5.*-zfsdata

Note how I set the same drive as spare for both pools:
# zpool add DataStore spare /dev/gpt/Bay1.3-Seagate-9XW#####-zfsdata
# zpool add zroot spare /dev/gpt/Bay1.3-Seagate-9XW#####-zfsdata

# zpool set autoreplace=on DataStore
# zpool set autoreplace=on zroot

Result
# zpool status
Code:
  pool: DataStore
 state: ONLINE
 scrub: none requested
config:

        NAME                                       STATE     READ WRITE CKSUM
        DataStore                                  ONLINE       0     0     0
          raidz2                                   ONLINE       0     0     0
            gpt/Bay2.1-Seagate-9VS#####-zfsdata    ONLINE       0     0     0
            gpt/Bay2.2-Seagate-9VS#####-zfsdata    ONLINE       0     0     0
            gpt/Bay2.3-Seagate-9VS#####-zfsdata    ONLINE       0     0     0
            gpt/Bay3.1-WDCGP-WMAZA07#####-zfsdata  ONLINE       0     0     0
            gpt/Bay3.2-WDCGP-WMAZA03#####-zfsdata  ONLINE       0     0     0
            gpt/Bay3.3-WDCGP-WMAZA07#####-zfsdata  ONLINE       0     0     0
          raidz2                                   ONLINE       0     0     0
            gpt/Bay4.1-Seagate-9VS#####-zfsdata    ONLINE       0     0     0
            gpt/Bay4.2-Seagate-9VS#####-zfsdata    ONLINE       0     0     0
            gpt/Bay4.3-Seagate-9VS#####-zfsdata    ONLINE       0     0     0
            gpt/Bay5.1-WDCGP-WMAZA06#####-zfsdata  ONLINE       0     0     0
            gpt/Bay5.2-WDCGP-WMAZA07#####-zfsdata  ONLINE       0     0     0
            gpt/Bay5.3-WDCGP-WMAZA03#####-zfsdata  ONLINE       0     0     0
# zpool list
Code:
NAME        SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
DataStore  16.2T   202K  16.2T     0%  ONLINE  -

# df -h
Code:
Filesystem                               Size    Used   Avail Capacity  Mounted on
DataStore                                 11T     36K     11T     0%    /DataStore
I omitted showing zroot as it would just clutter the post.
Now I'm going to look for ways of benchmarking this thing, so that I know if my tuning works or not. :)
 
Savagedlight said:
Okay, so the hardware is in place, and it's time to play with the setup!

Now I'm going to look for ways of benchmarking this thing, so that I know if my tuning works or not. :)
This can be difficult on larger systems as you wind up measuring the cache and not the underlying ZFS device.

Here's what I'm doing for zpools here:

Code:
[0] new-rz2:~> df -h /data
Filesystem    Size    Used   Avail Capacity  Mounted on
data           21T    6.9T     14T    32%    /data
[0] new-rz2:~> zpool status
  pool: data
 state: ONLINE
 scrub: scrub completed after 21h14m with 0 errors on Wed Sep 15 20:53:48 2010
config:

        NAME        STATE     READ WRITE CKSUM
        data        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            da1     ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da3     ONLINE       0     0     0
            da4     ONLINE       0     0     0
            da5     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            da6     ONLINE       0     0     0
            da7     ONLINE       0     0     0
            da8     ONLINE       0     0     0
            da9     ONLINE       0     0     0
            da10    ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            da11    ONLINE       0     0     0
            da12    ONLINE       0     0     0
            da13    ONLINE       0     0     0
            da14    ONLINE       0     0     0
            da15    ONLINE       0     0     0
        logs        ONLINE       0     0     0
          da0       ONLINE       0     0     0
        spares
          da16      AVAIL   

errors: No known data errors

That's 3 raidz1's of 5 2TB drives each plus a hot spare, and a 256GB PCI Express SSD ZIL. The scrub is from a test I did after I replaced one drive and resilvered the pool.

For benchmarking, I use benchmarks/iozone. Once you get out of the 1.5GB/sec buffer cache, things settle down to a steady-state 500MB/sec which is sustainable for days:

rz2-iozone.jpg


(The nifty graphs are a paid add-on to iozone - consult the source site in the port's Makefile for more info).

[Note to mods - if inlining images isn't allowed here, feel free to edit my post. Here is the image if the inlined image is edited out.]
 
You will want to attach another ZIL device to the existing one, to create a mirror ZIL device.

Otherwise, if that ZIL dies, your entire pool is unrecoverable.
 
Back
Top