ZFS: Setting up new pool need help

bsus · Sep 8, 2011

Hi,
I need help by setting up the best solution for a new ZFS pool.

I have 5x WD Caviar Green 2TB (Head Parking disabled with wdidle3.exe)
I want to access them over NFS and Samba.

At the moment I have a poor solution. All five disks are in a big raidz2 without any special tuning options.
I know want to use a faster raid system (raidz ?) and use the 4k sectors.

What can I do to have use of the 4k sectors and what would be the best raid configuration?

Regards

Goose997 · Sep 8, 2011

bsus said:
Hi,

What can I do to have use of the 4k sectors and what would be the best raid configuration?

Regards

# zdb <poolname> |grep ashift

If ashift = 9, then you have created 512 bytes sectors, if ashift = 12 then you have 4096 byte sectors. You can force ashift = 12 by using the method on http://lists.freebsd.org/pipermail/freebsd-fs/2011-January/010478.html:
# gnop create -S 4096 ${DEV0}
# zpool create tank ${DEV0}.nop
# zpool export tank
# gnop destroy ${DEV0}.nop
# zpool import tank

My understanding is for 5 drives you should use raidz, i.e. 4 data and 1 parity. The reason coming from this post http://forums.servethehome.com/showthread.php?30-4K-Green-5200-7200-...-Questions:
As i understand, the performance issues with 4K disks isnâ€™t just partition alignment, but also an issue with RAID-Zâ€™s variable stripe size.
RAID-Z basically works to spread the 128KiB recordsizie upon on its data disks. That would lead to a formula like:
128KiB / (nr_of_drives â€“ parity_drives) = maximum (default) variable stripe size
Letâ€™s do some examples:
3-disk RAID-Z = 128KiB / 2 = 64KiB = good
4-disk RAID-Z = 128KiB / 3 = ~43KiB = BAD!
5-disk RAID-Z = 128KiB / 4 = 32KiB = good
9-disk RAID-Z = 128KiB / 8 = 16KiB = good
4-disk RAID-Z2 = 128KiB / 2 = 64KiB = good
5-disk RAID-Z2 = 128KiB / 3 = ~43KiB = BAD!
6-disk RAID-Z2 = 128KiB / 4 = 32KiB = good
10-disk RAID-Z2 = 128KiB / 8 = 16KiB = good

I followed these instructions for my 5 disk raidz.

regards
Malan

Sylhouette · Sep 8, 2011

One simpel question?

what does that gnop thing do? < --- nevermind google told me

But is this still a nessasary on FreeBSD 9.0 BETA?

regards
Johan

Goose997 · Sep 8, 2011

hi Johan

I was just browsing the forums since I remembered I created the pool slightly different from the description above.

Look at this thread: http://forums.freebsd.org/showthread.php?t=21644. This is the method I used and it explains everything including what gnop does.

regards
Malan

bsus · Sep 8, 2011

Hi Malan,
thanks for the quick and detailed answer.

My ex-pool had like thaught only 512

Code:

freebsd admin # zdb
storage
    version=15
    name='storage'
    state=0
    txg=228627
    pool_guid=2663825039003801292
    hostid=2451903120
    hostname='freebsd.fritz.box'
    vdev_tree
        type='root'
        id=0
        guid=2663825039003801292
        children[0]
                type='raidz'
                id=0
                guid=7473656509418404889
                nparity=2
                metaslab_array=23
                metaslab_shift=36
                ashift=9
                asize=10001970626560
                is_log=0
                children[0]
                        type='disk'
                        id=0
                        guid=15209446662985752008
                        path='/dev/ad10'
                        whole_disk=0
                        DTL=79
                children[1]
                        type='disk'
                        id=1
                        guid=5480073870470617926
                        path='/dev/ad6'
                        whole_disk=0
                        DTL=78
                children[2]
                        type='disk'
                        id=2
                        guid=2888928990342718420
                        path='/dev/ad8'
                        whole_disk=0
                        DTL=77
                children[3]
                        type='disk'
                        id=3
                        guid=15502033663295987383
                        path='/dev/ad12'
                        whole_disk=0
                        DTL=76
                children[4]
                        type='disk'
                        id=4
                        guid=3188101342759932169
                        path='/dev/ad14'
                        whole_disk=0
                        DTL=75
freebsd admin # zdb storage | grep ashift
                ashift=9

I know did:

Code:

gnop create -S 4096 /dev/ad6
umount -f /usr/home && zpool destroy storage #if a older pool exists
zpool create storage raidz ad6.nop ad8 ad10 ad12 ad14

The rest

Code:

 zpool status
  pool: storage
 state: ONLINE
 scrub: none requested
config:

        NAME         STATE     READ WRITE CKSUM
        storage      ONLINE       0     0     0
          raidz1     ONLINE       0     0     0
            ad6.nop  ONLINE       0     0     0
            ad8      ONLINE       0     0     0
            ad10     ONLINE       0     0     0
            ad12     ONLINE       0     0     0
            ad14     ONLINE       0     0     0

errors: No known data errors
freebsd admin # zdb storage | grep ashift
                ashift=12
freebsd admin # ^C
freebsd admin # zpool export storage
freebsd admin # gnop destroy /dev/ad6.nop
freebsd admin # zpool import storage
freebsd admin # zdb storage | grep ashift
                ashift=12
freebsd admin #
freebsd admin # zpool status
  pool: storage
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad6     ONLINE       0     0     0
            ad8     ONLINE       0     0     0
            ad10    ONLINE       0     0     0
            ad12    ONLINE       0     0     0
            ad14    ONLINE       0     0     0

errors: No known data errors
freebsd admin # zfs create storage/home

I think nop creates a virtual(?) device with special options and when the first device of the raidz is with 4k then the other ones will also get set with 4k.

Code:

man gnop
Formatting page, please wait...Done.
GNOP(8)                 FreeBSD System Manager's Manual                GNOP(8)

NAME
     gnop -- control utility for NOP GEOM class

SYNOPSIS
     gnop create [-v] [-e error] [-o offset] [-r rfailprob] [-s size]
          [-S secsize] [-w wfailprob] dev ...
     gnop configure [-v] [-e error] [-r rfailprob] [-w wfailprob] prov ...
     gnop destroy [-fv] prov ...
     gnop reset [-v] prov ...
     gnop list
     gnop status
     gnop load
     gnop unload

DESCRIPTION
     The gnop utility is used for setting up transparent providers on existing
     ones.  Its main purpose is testing other GEOM classes, as it allows
     forced provider removal and I/O error simulation with a given probabil-
     ity.  It also gathers the following statistics: number of read requests,
     number of write requests, number of bytes read and number of bytes writ-
     ten.  In addition, it can be used as a good starting point for implement-
     ing new GEOM classes.
...skipping...
     /dev/da0 with 50% write failure probability, and how to destroy it.

           gnop create -v -w 50 da0
           gnop destroy -v da0.nop

     The traffic statistics for the given transparent providers can be
     obtained with the list command.  The example below shows the number of
     bytes written with newfs(8):

           gnop create da0
           newfs /dev/da0.nop
           gnop list

SEE ALSO
     geom(4), geom(8)

HISTORY
     The gnop utility appeared in FreeBSD 5.3.

AUTHORS
     Pawel Jakub Dawidek <pjd@FreeBSD.org>

FreeBSD 8.2                   September 17, 2009                   FreeBSD 8.2

Goose997 · Sep 8, 2011

bsus said:
I think nop creates a virtual(?) device with special options and when the first device of the raidz is with 4k then the other ones will also get set with 4k.

hi Johan

Yes, it seems to create a "filter" on top of the normal device to test features. In this case by forcing the device to reflect the larger sector size it forces the zpool to align to 4K sector offset.

I actually have Seagate disks which are 512 byte in my server, but I was at the time not sure and went for the 4K sectors.

regards
Malan

bsus · Sep 8, 2011

Could somebody tell me what there is with the "copies" option?

I found this site here but I dn't understand the direct sense.

Regards

Sylhouette · Sep 8, 2011

copies means copies

If you write a file to disk, zfs wil write a copy to the disk also.
It will write your file twice.
If you set your copies to 3 it will write that one single file 3 times to disk.

You could use this if you have only one disk, and do want some sort of fail protection.
If a block gets destroyd for whatever reason, your file is also somewhere else on the disk.
So still access able.

if you have multiple disk in your vdev, then zfs tries to write the copy to other disks then the original copy.

I hope i made it clear.

And thanks all for the gnop explanation.
I will get two WD20EARX drives tomorrow fo my home server.
I want to make sure i really need it, it looks like a hackish way!

regards
Johan

bsus · Sep 8, 2011

Ok thanks, now I undersood

The WD20EARX theire a problem for themself.
You have to change the firmware over the dos with wdidle3.exe (isn't as hard but I didn't succeed on another server)

On my server they all have deactivated the head parking but under the other one they still running till death. I will look if it maybe works just setting the timer higher but I don't know. Also I think there will be in some weeks 4TB drives (there are already some) so that the 2TB will get cheaper

Epikurean · Sep 13, 2011

Goose997 said:
# gnop create -S 4096 ${DEV0}
# zpool create tank ${DEV0}.nop
# zpool export tank
# gnop destroy ${DEV0}.nop
# zpool import tank

If you do this on an already working pool you will lose all the data. (or am I missing something)

Goose997 · Sep 14, 2011

Epikurean said:
If you do this on an already working pool you will lose all the data. (or am I missing something)

Hi,

I haven't tried it on a working pool

However, it should wipe out all data because according to my understanding it fixes the first sector on which ZFS will start writing.

regards
Malan

bsus · Sep 14, 2011

Yes working pools will loose data.

Because they'll have two write data total new orderd with 4k.
Goose explÃ¶ains it better

Regards

Goose997 · Sep 14, 2011

hi

Let me try to explain it properly: If you disk has 512 byte sectors, it means the controller will be able to write in a granularity of 512 bytes to disk. If it has 4K sectors, it means the smallest granularity is 4K. If your partition is created and it does not fall on a 4K boundary, it means the controller always have to write 2 sectors (the part below and above the boundary). This tends to make IO slow

. The same should apply for other partitions you create, not only ZFS. By aligning the first pool sector of one disk in the ZFS pool on a 4K boundary, you are automatically aligning all the disks on a 4K boundary. And of course, you are shifting the point where data gets written , so make a backup first :e

regards
Malan