Hi,
Sorry for the late reply, I was a little bit busy. I'll try to reply point by point.
usdmatt said:
The following blog post from Sun/Oracle suggests that by default ZFS stores one copy of data and 2 metadata. With copies(n) > 1, metadata is stored 3 times and the actual data is stored n times.
https://blogs.oracle.com/relling/entry/zfs_copies_and_data_protection
I can't say for certain, but I see no reason why this should change with deduplication. The first time a record is stored, 2 copies of data will be stored and 3 metadata (with copies=2). The next time you store the same record, it will increase the ref count but still have 2 full copies on disk. I can't see why or how it would work any other way.
You're right. It's just I misunderstand the first time I read about that. I found the article I was talking about, it's from opensolaris
http://hub.opensolaris.org/bin/view/Community+Group+zfs/dedup. The relevant part is "How does the dedup property interact with the copies property?"
Reading it another times makes me understand how it works. English is not my native language, so sometimes it's better to reread again and again
usdmatt said:
The whole point of copies is to provide redundancy on single disk pools (such as laptops or HW raid arrays as mentioned in the blog). Without copies you can scrub a single disk pool but not fix any records that fail checksum. With copies>1 ZFS can fix the corrupt data.
I agree with that.
usdmatt said:
In your case I don't see much advantage to copies. You already have dual redundancy in the vdevs. Obviously you immediately half the amount of data you can store with copies=2, bringing the raw space of your pool down to 8TB (((6-2)x2TB + (6-2)x2TB) / 2 copies). If you're ok with that amount of space you may as well just set up a stripe of 3 way mirrors which provides great resilience, probably higher performance and I think mirrors are much simpler and easier to manage. May rebuild quicker as well as it only has to read from another disk in the same mirror rather than reading from all the other disks in the vdev.
Yes, in my configuration up to 4 disks can fail (2 disks in each vdev) and the pool will still be available but there isn't any redundancy for the datas without copies.
Imagine a case where one disk fails, replacing it and during the resilver another one fails and before the resilvering completes a third one and a fourth one fail in the same vdev (yes it's hypothetical but there are 2TB drives from the same serie and resilvering can take some time). It leaves me with 3 good drives and 3 bad ones in the same vdev. This vdev is down and the pool too. With a lot of chance (yes I said it's hupothetical) I can use dd or dd_rescue to backup one of the failing drives, reinsert it to the vdev and continue my resilvering. Without copies I am not sure I can get all my datas back, with copies the chances are greater.
You said with copies=2 it leaves me with only 8 TB of space. It's not a problem for me because it's 8TB of non deduped datas. Once deduped it can be much more.
Your point about the 3 way mirror is something I did not thought about. It is interesting said like that, management and rebuild time are better, ok, perfamance is greater, ok but it is not a problem for me because I don't need extra perfs just something reasonable, but what stops me is resilience. If we reuse the case before and the 3 failing drives are from the same mirror vdev, all is down. I don't know why in this situation having copies reassures me. And if I use copies, the space left is 4TB of non deduped datas which is not enough for me.
usdmatt said:
Of course we've all seen or read about ZFS pools going corrupt, even with redundancy, quite often ending with a rebuild. Unfortunately the complexity of the inner working of ZFS mean than one of it's downsides is recovery from errors is near impossible if ZFS can't fix the problem itself (There are blogs on the net about recovering data from the few people who really know ZFS but it's a hell of a lot of work just to find and pull out one <=128k record). The ideal solution in any scenario (especially with critical data) is to have 2 independent backups so that the loss of any system still leaves you with a backup. If you're considering copies=2, you could also just create 2 separate pools with one vdev each and zfs send data from one to the other. A corrupt, unfixable vdev will bring down a pool with 2 vdevs, whereas with separate pools, the second should hopefully survive unless something catastrophic has gone wrong with the system.
Having to independant vdevs in the same machine. Interesting. Don't thought about that either. In this case, what configuration would be better ? 2 raidz vdevs (10 TB non deduped) or 2 raidz2 vdevs (8 TB non deduped) ?
With the configuration I have now "raid60", performance are acceptable for me but I'm not sure what they can be in a raidz or raidz2. I don't need something monstruous, just something acceptable.
usdmatt said:
Just going back to dedupe for a second, if you ever plan to destroy a large file system it may be worth clearing it out manually first. The posts below suggest it can be a real problem and that one company ended up getting a loan system from Oracle with 128GB of RAM just so the destroy could finish. (This may have been fixed, I've not had any involvement with dedupe or very large systems but it seems to just be a side effect of lots of deduped data and it's well documented tendency for using lots of RAM)
http://lists.freebsd.org/pipermail/freebsd-fs/2012-August/014904.html
http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg47526.html
Thanks for the advice. I've done some tests with dedup some time ago and yes had to recreate the pool after destroying a file system. I plan to add to 2 128GB SSD as cache devices to this configuration, it should be enough I hope.
usdmatt said:
I'm also not sure how wise it is to patch a system with v28 support. Would it not make more sense to just upgrade to 8.3?
It's just for the moment I can't update the entire system to 8.3 but it will be updated soon.
The patch I found is from Freebsd's lists, I can't find the thread but the patch can be found here
http://people.freebsd.org/~mm/patches/zfs/v28/.
Sebulon said:
Sigh...dedup
Yes, and let that be a cautionary example to funking
always have a backup! It´s so silly, that I did something I thought was a simple maintenance command leaving the entire system of about 10-14TB completely
fubar, just because of dedup. If you read on in that thread, I also tried hooking up just the disks to a server with even double that RAM, 64GB, and it was still fubar. I´m so pissed off at dedup right now, words can´t even describe it. I abandoned the rescue operation after well over a month. There was 2TB worth of data that was really important(and not backed up yet), but... And I really did try everything imaginable.
Rookie mistake. But I thought to myself "Yeah, I´ll just delete these old filesystems(and underlying snapshots) first and back up the important stuff right after that, that´s just ordinary maintenance.", but "right after that" never had the chance to come

And I call myself a "Storage technician", gah!
/Sebulon
Sorry for that but thanks for the advice. I try to take care of a maximum of things to have something whitout problems.
Slurp said:
http://arc.opensolaris.org/caselog/PSARC/2009/571/mail
Unless BSD changed the behaviour (doubtful), it does what I said, keeps at least N copies regardless of how many are deduped. If you have a mail saying otherwise, please share the link.
Aside, like usdmatt I'm surprised that you prefer to update ZFS despite the fact that you don't know its codebase to just updating the distro.
Like I said before, it was just my misunderstanding of what I read. Rereading the link I provided upper, it says what you guys told me
