raidz strange capacity

Hello,

I made a raidz of 6x500GB disks. As far as I know, raidz is very much like a RAID5 configuration. The capacity and performance and all that stuff. It is also supposed to have fault tolerance 1 disk.

What is really strange about this array is the disk size. Check this out:
Code:
root@datacore:/root # zpool status
  pool: zroot
 state: ONLINE
  scan: scrub repaired 0 in 1h15m with 0 errors on Wed Mar  6 16:42:42 2013
config:

	NAME          STATE     READ WRITE CKSUM
	zroot         ONLINE       0     0     0
	  raidz1-0    ONLINE       0     0     0
	    gpt/disk  ONLINE       0     0     0
	    ada0      ONLINE       0     0     0
	    ada2      ONLINE       0     0     0
	    ada3      ONLINE       0     0     0
	    ada4      ONLINE       0     0     0
	    ada5      ONLINE       0     0     0

errors: No known data errors

6 disks of 500GB each. This GPT/disk is like this, because the boot loader is on it and it has GPT table and very small boot partition.


The hard disks:
Code:
root@datacore:/root # 
root@datacore:/root # camcontrol devlist
<Hitachi HUA721050KLA330 GK6OA74A>  at scbus3 target 0 lun 0 (pass0,ada0)
<ST3500320AS SD15>                 at scbus3 target 1 lun 0 (pass1,ada1)
<Hitachi HUA721050KLA330 GK6OA74A>  at scbus4 target 0 lun 0 (pass2,ada2)
<Hitachi HDS721050DLE630 MS1OA600>  at scbus4 target 1 lun 0 (pass3,ada3)
<Hitachi HUA721050KLA330 GK6OA74A>  at scbus5 target 0 lun 0 (ada4,pass4)
<Hitachi HUA721050KLA330 GK6OA74A>  at scbus6 target 0 lun 0 (pass5,ada5)
root@datacore:/root # 
root@datacore:/root # diskinfo /dev/ada*
/dev/ada0	512	500107862016	976773168	0	0	969021	16	63
/dev/ada1	512	500107862016	976773168	0	0	969021	16	63
/dev/ada1p1	512	48128	94	0	17408	0	16	63
/dev/ada1p2	512	500107779584	976773007	0	65536	969020	16	63
/dev/ada2	512	500107862016	976773168	0	0	969021	16	63
/dev/ada3	512	500107862016	976773168	4096	0	969021	16	63
/dev/ada4	512	500107862016	976773168	0	0	969021	16	63
/dev/ada5	512	500107862016	976773168	0	0	969021	16	63
root@datacore:/root #

And at the end, the zpool size:
Code:
root@datacore:/root # zpool list
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
zroot  2.72T   280G  2.45T    10%  1.00x  ONLINE  -
root@datacore:/root # 
root@datacore:/root # zfs list
NAME              USED  AVAIL  REFER  MOUNTPOINT
zroot             238G  1.99T  2.57G  /
zroot/DataCore   9.91G  1.99T  9.91G  -
zroot/DataCore2   221G  1.99T   221G  -
zroot/swap       4.13G  1.99T  9.65M  -
root@datacore:/root #

Well, I am not very good with the math, but if there is a fault tolerance 1 disk: 3TB - 0.5TB = 2.5TB (This doesn't even include the journal records necessary to recover the failed disk).
And my question is: how can that thing be 2.72TB and can it really sustain 1 disk failure without losing data or damage the array?

Thank you.
 
Don't ask me for the exact reasoning but # zpool list shows the 'raw' capacity of the pool before redundancy which should be about the space of 6x500GB drives. (at least with raidz)

# zfs list shows the useable space which is 5x500GB. 1.99TB + 238GB = 2.228TB which is about right for 5 500GB drives.
 
I would give a slightly more nuanced answer.

ZFS can no longer protect your files when it loses its redundancy. If bad sectors were to occur at any of the remaining disks, this could lead to dataloss.

However, thanks to ZFS employing ditto copies - aka copies=2 - for all ZFS metadata, this means ZFS does gain protection against bad sectors that affect metadata. Unless you are extremely unlucky and a bad sector occurs at both copies, you should be fine without any corruption to the ZFS filesystem itself. ZFS will then list any affected files in the zpool status -v output, which have become inaccessible due to the bad sector and loss of redundancy.

If you still had redundancy (i.e. a complete mirror, complete RAID-Z, single-degraded RAID-Z2 or double-degraded RAID-Z3) - ZFS would still be able to fix virtually all bad sectors on the fly.

Therefore, in my opinion, you could say that:

RAID0: only protection against bad sectors in metadata; files at mercy of bad sectors.
RAID-Z/mirror: protected as long as no disks are missing or malfunctioning - same as RAID0 when one drive fails.
RAID-Z2/triple-mirror: protected even after a single disk failure - same as RAID0 when two drives fail.

In other words, RAID-Z2 has the benefit of still protecting your files against bad sectors - even with a single drive missing. This means a much safer rebuild procedure. Should ZFS encounter a bad sector - it can fix it on the fly using redundant data sources by writing the extrapolated data to the affected bad sector. This is done without the user or application ever noticing there even was a bad sector.
 
Since others have provided excellent answers and comments to the rest of your post, I'd like to point out one thing with your setup.
gnoma said:
6 disks of 500GB each. This GPT/disk is like this, because the boot loader is on it and it has GPT table and very small boot partition.

Since you only have boot sector on one drive, you will be unable to boot from the pool if that drive fails. Your data will still be safe, but you will have to do quite a bit of legwork to boot the system.

You should really put the boot code on all involved drives when you boot from a zpool.
 
This answers my question.
Don't ask me for the exact reasoning but # zpool list shows the 'raw' capacity of the pool before redundancy which should be about the space of 6x500GB drives. (at least with raidz)

# zfs list shows the useable space which is 5x500GB. 1.99TB + 238GB = 2.228TB which is about right for 5 500GB drives.
I know that RAIDz/RAID5 should be OK with 1 disk failure, but I was wondering how can be 2.72TB data on 2.5TB disk space. Now I know that this number 2.72TB is wrong.

Since you only have boot sector on one drive, you will be unable to boot from the pool if that drive fails. Your data will still be safe, but you will have to do quite a bit of legwork to boot the system.

You should really put the boot code on all involved drives when you boot from a zpool.
You are perfectly right. But even if exactly this one disk of all the 6 disks fails, I have very handy a flash drive with GRUB2 + zfs modules right in the server room. And this is just if I get extremely unlucky to have the disk failed right on reboot. Because if it fails while the system is running, I'll just insert in new one and add the boot code to it before rebuilding the array.
However, you are right, but it guess I was too lazy to do it by the book with all 6 drives.

Thank you all, this really answer my questions.
 
Back
Top