ZFS ZFS or Object Storage?

I'm doing some research for a website I am working on. It'll be a media-heavy website (audio, video and images) and will need to be able to stream lots of music and videos.

I don't have much experience on the storage front, so I'm curious whether I should use ZFS zraid3 or an open-source S3-compatible object storage system. I'm tempted to use ZFS since, from what I have read, zraid3 is good about handling errors due to checksum verification. I'm a bit concerned about performance, though. This thread, for example:


Should I worry about this? ZFS zraid3 seems to be able to detect errors and fix them with parity data.
 
If speed is an issue a mirror is also an option.
If error fixing is an issue raidz2 can fix two drive failures.
Thank you.

I've been reading this article quickly:


Which says mirroring is better than a raidz configuration but I'm somewhat confused. It talks about having each vdev in a mirror. So you have 4 drives with each one mirrored to exactly one other drive. So would the mirrored drives all be in the same zpool or would you need different zpools for each mirrored set?

Sorry for the stupid question but I've never used ZFS before.
 
You only need really one zpool in all cases.
But for a mirror you always add two drives to the pool, eg,
For striped data between drivea & driveb,
Code:
zpool create myzpool mirror /dev/drivea /dev/drivec mirror /dev/driveb /dev/drived
 
You only need really one zpool in all cases.
But for a mirror you always add two drives to the pool, eg,
For striped data between drivea & driveb,
Code:
zpool create myzpool mirror /dev/drivea /dev/drivec mirror /dev/driveb /dev/drived
I see. Thank you very much for your help!
 
Thank you.

I've been reading this article quickly:


Which says mirroring is better than a raidz configuration but I'm somewhat confused. It talks about having each vdev in a mirror. So you have 4 drives with each one mirrored to exactly one other drive. So would the mirrored drives all be in the same zpool or would you need different zpools for each mirrored set?

Sorry for the stupid question but I've never used ZFS before.
My suggestion: use mirroring everywhere, and do NOT use raidz or whatever else, unless you really need to.
If something in simply, MAYBE will work

RAID start back in the day with very small and unreliable drives, where you must "stack" a pile of them to get a decent volume size.
Today (solid state memories) 2TB are used even on smarttv (!), therefore if you DO NOT need bigger volumes, stay safe.
Make your life much easier, go to "mirror" and nothing more (3 drives, if you are really really concerned on HW failure. Yes, it is possible to make a more than 2 drives mirror)

Why not use raidz (or whatever raid <> 1), if you don't have to?

Because it is complex, and therefore fragile
It is difficult to maintain it in case of failure, compared to a "simple" mirror system
Make multiple mirror-based "lonly" drive
For larger spaces, simply mirror and shard the two drives at the application level

able to stream lots of music and videos.
This seems solid state memory
DO NOT add some kind of "slaughter" or mixed configuration, with ZIL drives or whatever
Complex, fragile, hard to maintain

In some extreme cases I deploy RAMdisk based servers (!) where the data is directly in the RAM, to get a lower latency even compared to NVMes
Obviously "read only", e.i. movie in my situation, but costly

Short version: the simpler, the better
 
It'll be a media-heavy website (audio, video and images) and will need to be able to stream lots of music and videos.
Split this up, the website itself can be hosted by a single webserver, media will come from different webserver that gets it's data via a varnish caching server. Visitors get their media streams from the varnish server. The varnish server will make sure your often recalled media is nicely cached and doesn't get served from a relatively slow storage backend.

(That's how a large pr*n website was hosted I had to maintain some years ago)
 
"...had to..." 😜
Yeah, funny story. That's what you get being a contractor, you sometimes end up in the weirdest places :)

It was fun to see how they set it all up though, you can imagine their websites heavily depended on "media" being easily and quickly accessible (humongous number of concurrent users too).
 
[raidz] Because it is complex, and therefore fragile
It's (somewhat!) complex, but where do you get your conclusion from? Any evidence?

It is difficult to maintain it in case of failure, compared to a "simple" mirror system
Uhm, replace disk, watch resilver in action? How is that difficult?

edit, personal experience, I use raidz on 4 disks for many years now, had to replace one disk so far and resilver was automatic and painless.

Yes, with raidz, the second disk failure before successful resilver is catastrophic, that's one reason (of many) why you also want to have backups. And just using several mirrors doesn't significantly improve things, have both mirrored disks fail and that's it. If you need to protect against that, the only way is more redundancy, and wheter you pick "complex" raidz2/raidz3, "complex" mirrored stripes, or even a more "complex" combination of all of that mostly depends on the number and kind of disks you'll use.
 
the problem with complex raid schemes is when you get in the rare situation when enough drives fail and cause the whole array to fail
i once manually rebuilt / reverse engineered an adaptec raid5 and it was no fun
mirrors are far easier to deal with
 
covacat my stance is, when your pool fails, you better have a backup to re-create it from scratch. Sure, "reverse-engineering" is not a sane option. And proprietary RAID implementations in the controller firmware are yet another topic :rolleyes: – I guess that's (hopefully) a thing from the past.

With that, I don't see how a set of mirrors would be "better". If you're lucky, the second disk failing is not the mirror of the first one, ok ;)
 
Blimey. You have all given me stuff to think about. Let me go over a couple of points.

I like the idea of using Varnish to serve media files. That would undoubtedly solve streaming media files to many people, but I'm not sure how much RAM would be a good match on that front. I've never used Varnish before.

As to mirrors versus zraid I'm still confused. My understanding is that with a mirrored config if one disk fails, you replace the disk, and ZFS will automatically copy the data onto the new drive, which happens pretty fast. If the second disk fails before the first disk has had the data copied over, the whole zpool is dead.

When it comes to zraid, you can lose up to three disks, but resilvering (is that the correct term?) takes much longer. I've heard some pretty crazy numbers thrown around on this front.

From my novice perspective, it seems that if I used zraid3 for storage and Varnish to serve the data, that would be the best option. Or am I completely wrong?
 
if you have say 2 disks in a mirror and both develop bad sectors the array will be dead
but it s pretty easy to use just one as single disk setup and recover a lot of the data or even all
the meta data is either at the start or the end of the disk so you can use gnop or similar to mount the original fs-s
with raidXXX it may be much more difficult
 
As to mirrors versus zraid I'm still confused. My understanding is that with a mirrored config if one disk fails, you replace the disk, and ZFS will automatically copy the data onto the new drive, which happens pretty fast. If the second disk fails before the first disk has had the data copied over, the whole zpool is dead.
You can for example stripe mirrors. Then "any" number of disks can fail, as long as it's not exactly the two disks forming one mirror. And sure, as mirroring is a pretty "simple" thing to do, it's also simple to restore the data on a new disk.

When it comes to zraid, you can lose up to three disks, but resilvering (is that the correct term?) takes much longer. I've heard some pretty crazy numbers thrown around on this front.
So you mean raidz3 here. With "simple" raidz, a second disk failure would already be catastrophic. Can't tell much about resilver performance except in the case I once needed it (on a "simple" raidz), it finished in a sane time (around 2 hours IIRC) on a 4 x 4TB pool.

When using raidz/raidz2/raidz3, the "rule of thumb" is not to put an excessive number of disks in it.
 
I like the idea of using Varnish to serve media files. That would undoubtedly solve streaming media files to many people, but I'm not sure how much RAM would be a good match on that front. I've never used Varnish before.
You don't have to serve everything from memory, memory would be preferred of course as it's the fastest solution. But the varnash server could have it's cache stored on a RAID0. It's not a big deal if that breaks, it would simply serve the content from the storage backend directly without caching it. But the upshot of having your media content sources from a fast cache is more important here, so most of it will just get served from there. Varnish's cache stored on a couple of fast SSDs with RAID0. While the storage backend itself could use the relatively slow RAIDZ{,2,3} on spinning rust for example.

The set up they used relied heavily on HAProxy, with the various different kinds of content spread over a number of different backend clusters (static content cluster, PHP cluster and a media cluster). Even access to the MySQL database was load-balanced to several read-only slaves with updates/inserts being to done on a master database. All in all they managed to choke a 4 Gbit upstream just serving content.
 
this is 10 years old but still ...
the conclusion (at the end of page 2) rules :)
 
When it comes to zraid, you can lose up to three disks, but resilvering (is that the correct term?) takes much longer. I've heard some pretty crazy numbers thrown around on this front.
The resilvering time is generally proportional to the size of the disk you had to replace, pretty much regardless of the RAID type. This is because the replacement disk will be 100% busy writing data. Resilvering will generally slow down other disk access, e.g. by applications, a lot. This is why the size of the "disks" you use in any RAID set needs to be carefully considered. Size impacts time to recover from a lost disk while it resilvers (you applications will work, but slooooooowly). It can take days to resilver a large (many terabytes) disk.

You mentioned S3, which, AFAIK, is generally a RESTful API, meaning network arbitrated access to data. Networks are generally orders of magnitude slower than what you can get from ZFS on the PCI bus. [I am not expert, but I think that at least some versions of S3 offer wide area redundancy -- so the trade-off is different.]

You are wandering into complex application design. And we have not seen any specifications or metrics. Beware of making things more complicated than they need to be.
 
... It'll be a media-heavy website (audio, video and images) and will need to be able to stream lots of music and videos. ...
Why not outsource all the service to a cloud provider? They have built-in CDNs, and can stream right from their storage layers. And they have very good network connections.
 
Why not outsource all the service to a cloud provider? They have built-in CDNs, and can stream right from their storage layers. And they have very good network connections.
Cost primarily. When I was first working out the idea for this website, I was working off the assumption that I would use AWS for everything, but after looking at the numbers, the price ended up insane. Plus, I'd be stuck entirely in the AWS ecosystem and unable to leave at a future date.

I initially liked AWS because it had a media transcoding service, so I wouldn't need to bother with high CPU VMs / dedicated servers, and it also supported live streaming (like Twitch). I could create a new spreadsheet and go through the numbers more carefully again.
 
Pornography was and is a big driver for tech, even if no one likes to talk about it 😄
Yeah, funny type of business. I had things on my screens the entire day that would have gotten me instantly fired on every other job I had 🤣

But yeah, they're usually at the forefront trying out new technologies, big storage, streaming, etc. So that was interesting to see.
 
Yeah, funny type of business. I had things on my screens the entire day that would have gotten me instantly fired on every other job I had 🤣

But yeah, they're usually at the forefront trying out new technologies, big storage, streaming, etc. So that was interesting to see.
did they have a special category 'ASCII pr0n' for the sysadmins ?
 
Back
Top