Hi all. First time poster, long time lurker.
I'm after the wisdom of the crowd. I've been a big tape user for large HSM based fileservers for a long time. I use high end HSM filesystems such as DMF and SAM-QFS for peta-scale storage and data integrity requirements. Currently under management of around ~3 PB of data. Here is the thing though. 3 PB of data isn't that much any more (that fits in a couple of racks of disk given many 84 drive bay solutions from the likes of Dell, EMC and DDN) - and filesystem data integrity validation has come a long, long way, as has filesystem recoverability. I'm considering (as a consequence of some of the obvious usability issues that disk --> slower disk --> tape based HSM solutions create) moving to an all disk world for my primary file server technology and keeping monsters such as SAM-FS/QFS on the side for the "deep archive" vaulting of massive, massive data.
I work in an industry where a few terabytes a day between friends is kind of normal in terms of user generated data, so scale is important. So - if money were not of consequence here, and in an ideal world, I would have been considering something like this:
I'd love to hear thoughts/considerations from any and all of you.
z
I'm after the wisdom of the crowd. I've been a big tape user for large HSM based fileservers for a long time. I use high end HSM filesystems such as DMF and SAM-QFS for peta-scale storage and data integrity requirements. Currently under management of around ~3 PB of data. Here is the thing though. 3 PB of data isn't that much any more (that fits in a couple of racks of disk given many 84 drive bay solutions from the likes of Dell, EMC and DDN) - and filesystem data integrity validation has come a long, long way, as has filesystem recoverability. I'm considering (as a consequence of some of the obvious usability issues that disk --> slower disk --> tape based HSM solutions create) moving to an all disk world for my primary file server technology and keeping monsters such as SAM-FS/QFS on the side for the "deep archive" vaulting of massive, massive data.
I work in an industry where a few terabytes a day between friends is kind of normal in terms of user generated data, so scale is important. So - if money were not of consequence here, and in an ideal world, I would have been considering something like this:
- Build a whomping massive IB or FC connected (or SAS Gen3 maybe!) connected array next to a FreeBSD/Solaris 11.1/OmniOS distribution controlling it from a ZFS perspective. Share it out, quota it, CIFS's, NFS, whatever.
- Build a secondary whomping massive IB or FC (or SAS gen3) connected array running an identical distribution and "send sideways" ZFS snapshots daily as a data integrity and protection mechanism.
- Cron
zpool scrub's weekly in slow times /non peak load time frames to try and maintain on disk consistency etc. - Considering a RAID-Z2 or RAID-Z3 given the significant number of spindles involved. I dare say I'll consider LOG/ZIL as a result of some of the workload dealing with async/sync NFS writes and reads commonly.
but I have a niggling feeling that I should use ZFS where its strength lies in massive, massive raw file serving and use things like SAM where its strength lies, in behaving like a giant vault, rather than trying to shoe-horn HSM solutions like SAM and DMF into a come one come-all generic fileserver (with usability woes as a result).What are you, bats#$#t insane? Where is your tape DR?
I'd love to hear thoughts/considerations from any and all of you.
z