Future of filesystems

I don't think you're understanding exactly how raidz (or parity based raid) works.

If you lose a disk, you have literally lost that data. Having the 'metadata' makes no sense. The only way to get that data back is by reading the rest of each block from the remaining disks along with the parity and recalculating it. This is why you would end up reading 160TB if you had a full array of 40x4TB disks rather than just 4TB. The data for that 4TB disk no longer exists, it has to be recalculated.

As far as I'm aware this is the same for raid5/6 and is one of the main reasons why it's strongly discouraged to have any more than 8-10 disks in one vdev. It's probably the biggest mistake anyone can make when choosing pool configuration.

As for not 'scaling out', I don't see the problem. A pool with 1024 disks in 128 8 disk vdevs should rebuild no slower than a pool with 1 8 disk vdev. If you lose any disk, it would only have to read from the other 7 disks in the same vdev. In fact ZFS only needs to read the actual data - many raid arrays will rebuild the entire disk including the empty space.
 
olav said:
That's cool! Though adding a compression algorithm is a walk in the park compared to adding block point rewrite :)
I hope someone will add support LZMA soon! Yeah I know the compression speed is überslow, but it compresses data amazingly well. Decompression speed is usable though.
Large part of LZMA strength is support for big dictionaries. Try it with 128 KB and no solid mode and the strength will drop by a lot. Still would be stronger than deflate, but not very much and the speed would be hugely lower.
 
usdmatt, recalculating is what exactly I mean should be avoided. Why can't this be stored along with the rest of the metadata?

If you have a 1024 4TB disk setup with 128 8 disk vdevs, you lose 512TB of data to parity.

EDIT:
Oh I see it now, I've been doing the math wrong. First now I understand how XORing works. Hmmm, I still belive it's possible to improve this, but I have to think more about it.
 
I always do mirrored setups not RaidX. However, once they get fragmented ZFS becomes hell slow. Also adding more vdev doesn't balance your data. Only send | receive helps which is usually not possible because of the downtime and the double space need.

Hammer2 seems be superior (based on its promises). Don't get me wrong, I like FreeBSD, I already have it on all my servers and desktops, it is really the best OS I ever touched, but I think that Hammer2 would be better for us than ZFS. Maybe if the evil Oracle didn't buy Sun then ZFS had some development beyond fixes, however one of the original ZFS developers said that BPR is pain to implement. Face the fact we will never have that. As far as I know Hammer already has something like BPR (rebalancing data). I also read that Hammer is hard to port to FreeBSD. I wish somehow we had Hammer2 next year.
 
Mage said:
Hammer2 seems be superior (based on its promises). Don't get me wrong, I like FreeBSD, I already have it on all my servers and desktops, it is really the best OS I ever touched, but I think that Hammer2 would be better for us than ZFS. Maybe if the evil Oracle didn't buy Sun then ZFS had some development beyond fixes, however one of the original ZFS developers said that BPR is pain to implement. Face the fact we will never have that. As far as I know Hammer already has something like BPR (rebalancing data). I also read that Hammer is hard to port to FreeBSD. I wish somehow we had Hammer2 next year.

Yes. It was suggested as a google summer of code project and Matt Dillon stated:

Personally I think it might be too much for a GSOC project.

http://wiki.freebsd.org/PortingHAMMERFS

Also from the FreeBSD developers wiki it claims it to be simple:
http://wiki.freebsd.org/IdeasPage#Port_DragonflyBSD.27s_HAMMER_file_system_to_FreeBSD

Port DragonflyBSD's HAMMER file system to FreeBSD

Suggested Summer of Code 2012 project idea
Contact Info

Technical Contact: ivoras@

Description

The HAMMER file system is a new file system created as a response to new ideas in file systems initiated by the likes of ZFS and BTRFS. It introduces innovative features and is considered production-ready in DragonflyBSD. It should be comparatively easy to port HAMMER to FreeBSD than any other file system because of the shared ancestry between FreeBSD and DragonflyBSD; however, though the project will bring eternal fame to the student brave enough to tackle it, the task is not for the weak of heart.

Requirements

Strong knowledge of C.
Understanding of file systems and VFS kernel interfaces
Understanding of kernel debugging.

Note

The project will be considered completed only if it passes most of the fsx and fstest file system tests (which are a part of FreeBSD's code).

I'm with you mage. I feel it would be nice to have more choices when it comes to BSD filesystems. Having UFS, ZFS, and HAMMER in FreeBSD would round out this server OS nicely.
 
There is no CDDL release of ZFS sources beyond v28. Any "public" sources are not actually public, and anyone reading them could be considered "tainted" and should not touch the open-source ZFS code.

The open-source ZFS devs (Illumos, Delphix, FreeBSD, NetBSD, Spectra Logic, various others) have moved ZFS beyond Oracle's versioning using feature flags, and have added several features that Oracle ZFS doesn't have.
 
phoenix said:
(Illumos, Delphix, FreeBSD, NetBSD, Spectra Logic, various others) have moved ZFS beyond Oracle's versioning using feature flags, and have added several features that Oracle ZFS doesn't have.

Do you have a link to the new open source features?
 
olav said:
Block pointer rewrite is absolutely needed for ZFS. I really like ZFS, but it really doesn't scale big very well. For example if you have a 40x4TB disk raidz3 setup which is almost 100% full and lose one disk you have to scan almost all the 160TB with data to rebuild a new disk. I would take months...

Sounds like whoever created that pool was "doing it wrong"...


ZFS scales just fine, if you build your pools properly. And yes I'd like to see hammer on FreeBSD as an additional choice.
 
I lack the knowledge to port Hammer2, however I would donate some additional money if I knew they are porting it. (I am not talking about hundreds of dollars but my part of the community).
 
Back
Top