Linus Torvalds Doesn't Recommend Using ZFS On Linux

ralphbsz · Jan 14, 2020

mark_j said:
And so you have OSs rebooting every second of every day inside these VMs? Because that's what it would take to get years back from a second saved rebooting.

The large cloud companies each have dozens or hundreds of million servers. A year has about 31 million seconds, so saving a second per boot amounts to saving roughly a whole server per year.

Paying customers paying per second for a server is unheard of (at least by me).

I pay by the millisecond. At least bills claim to be accurate to the millisecond of CPU usage.

Don't get me wrong, parallelization of an init has some merit, some times, BUT it has a big down side. It requires synchronization. If the market Linux chases needs this, then more power to them. My argument is, and always will be, that for the likes of FreeBSD they should avoid it at all costs.

Do you know what the market for FreeBSD is? What the people in charge of FreeBSD want it to be?
And: It is impossible to claim that "Linux" chases a particular market. There are lots of different things in the Linux ecosystem, which are after very different markets. RHEL (the commercial supported offering) is very different in market positioning from Fedora, and those two even come from the same company.

I understand your tradeoff: parallel init is (just like parallel make and multi-core software) inherently hard, harder than sequential. If a particular group of users doesn't need the benefit and can't handle the complexity, disabling it for them might make sense. It might also not make sense: Parallel init works perfectly, if the dependency graph is specified correctly; allowing users to be sloppy about it is something you can usually get away with in a sequential init, but it will occasionally bite you. From that viewpoint, using a parallel init is actually an exercise in system hygiene.

Eric A. Borisch · Jan 15, 2020

Phishfry said:
Yea but who reboots them? A server by definition should not need rebooting except for critical updates. 5 nines means only 5 seconds downtime a year.

99.999% (five nines) is five minutes of downtime a year.

Phishfry · Jan 15, 2020

I had to delete that silly comment. I am trying to stay out of the fray. I thought uptime was more important.

mark_j · Jan 15, 2020

ralphbsz said:
The large cloud companies each have dozens or hundreds of million servers. A year has about 31 million seconds, so saving a second per boot amounts to saving roughly a whole server per year.

Even if I accept 100s of millions, this saving for the sake of a massive change to an init service is ridiculously small it's inconsequential.
I don't know how often you reboot a server, but I try to avoid this at all costs. I cannot see AWS, for example, constantly rebooting servers. Why would they bother if they can reboot the VMs. How long the VMs take to reboot is moot because they merely swap you off to another. They do have spare ones.

I'm sorry, but I'm not seeing cost savings anywhere. I think it's a very long bow to draw. The cost/benefit of parallelization is not there.

ralphbsz said:
Do you know what the market for FreeBSD is? What the people in charge of FreeBSD want it to be?
And: It is impossible to claim that "Linux" chases a particular market. There are lots of different things in the Linux ecosystem, which are after very different markets. RHEL (the commercial supported offering) is very different in market positioning from Fedora, and those two even come from the same company.

The goal is stated in the handbook, the foundation ratifies this with a goal of supporting infrastructure that is server oriented, hence the Tier 1 selections and take up to Armv8. But why the question? Perhaps their goal would be clearer should they change their name to NetflixBSD?

It is clear the server market is what FreeBSD is chasing. They are not desktop and they're not embedded. They have minimal to poor support of laptops, so what else is left? By deductive reasoning and the stated goals of the foundation, it's pretty obvious where their attentions lie.

On that basis, parallelization and tampering with the init system is a bogus project with lots of risks and very little return.

CraigHB · Jan 15, 2020

I'd be happy if they left the init system alone, no skin off my nose, but I'm just a desktop user. Have to let the guys who do the heavy lifting with FreeBSD make those kind of decisions. If a faster init system is something they're concerned with then I guess it could happen. Haven't read much about that here and there are professional admins on this forum.

I can understand the need for a faster init system with embedded where purpose built appliances need to come up pretty quick. FreeBSD is not really going there with ARM sitting at tier two and I don't believe that's going to change in the near future. That makes it moot at this point.

teo · Jan 15, 2020

Linus Torvalds has reasons to say clearly what ZFS is, the reasons are because he doesn't see the file system as such a relevant technology and has no real maintenance. Considering Oracle's litigious, nature given the claim of the oracle interface ( Java ) I don't think it's a real license and performance gain either.

Vadim_Mkk · Jan 15, 2020

xtremae said:
That said, if your only concern is your laptop I don't think parallel service startup is all that important.

Linus Torvalds Net Worth

What is Linus Torvalds's net worth?

www.celebritynetworth.com

Linus Torvalds: The King of Geeks (And Dad of 3)

The license plate on Linus Torvalds' Mercedes SLK convertible says it all. The frame running around the outside of the plate says "Mr. Linux. King of All Geeks," but the plate itself reads "Dad of 3." Linus Torvalds has reached middle age, and so has Linux. Nowadays, it's easy to take both of...

www.wired.com

Linus Torvalds Net Worth - Net Worth Post

Linus Benedict Torvalds was born on 28th December 1969, in Helsinki, Finland. Currently, he resides in USA. Torvalds became famous as an engineer of

networthpost.org

How speak in Russia - He doesn't eat up last horseradish without salt

rigoletto@ · Jan 15, 2020

With 150M he is just arriving at the group of super rich.

brd@ · Jan 15, 2020

CraigHB said:
I can understand the need for a faster init system with embedded where purpose built appliances need to come up pretty quick. FreeBSD is not really going there with ARM sitting at tier two and I don't believe that's going to change in the near future. That makes it moot at this point.

Actually, we are working on getting ARM to Tier 1 for 13.

Hakaba · Jan 15, 2020

My interest : Is ZFS still considered as an important feature in FreeBSD ?
(The answer is yes with the current informations)

The ideologic war in Opensource licence world is not my cup of tea.

But it is for some people, so if systemd is a very good evolution, why there is no «BSD licenced systemd like» initiative ?

msplsh · Jan 15, 2020

Hakaba said:
But it is for some people, so if systemd is a very good evolution, why there is no «BSD licenced systemd like» initiative ?

It's called OpenRC

Here's some more:

Comparison of init systems - Gentoo wiki

wiki.gentoo.org

recluce · Jan 16, 2020

ralphbsz said:
Sorry to be blunt, but your experience on a laptop is irrelevant. A very large fraction of all computers in the world are either embedded or servers. Actual human interface machines are a small minority. Is faster booting relevant? Yes for embedded systems, where you want to come up fast (for example because you are acquiring data, or a human is waiting to press buttons on his dishwasher), and you don't have the luxury of running redundant servers (so while one server boots, the other is still serving). It is also highly relevant for servers, because (a) it greatly improves the effective uptime of servers, (b) it may allow one to get rid of the need for redundant backup server of battery-powered UPSes.

Fast boot can be achieved with much better tools than systemd, which is not that fast in booting anyway. OpenRC without parallelization is close, with (optional) parallelization it is faster. Probably because it doesn't have all the overhead and bloat that systemd has. A customized, purpose-built boot script likely does even better on an embedded device (known hardware specs).

On a server worth the name, boot time is totally irrelevant. No real server is running without a UPS and a generator, if it is really important, it must have both a fail-over or be clustered and have a DR site fail-over on top of that. Who cares about single server boot times in such a scenario? Nobody. Boot time on my servers is typically limited by the boot post of controllers and other I/O cards as well as the typically slow boot post of a server BIOS. The difference caused by init really is not important.

Effective uptime: A slow server under BSD might take a minute or two to come up. If that server is rebooted once every three months (which I feel is a reasonable average), this means a maximum of eight minutes a year - or a difference of 0.0016% in uptime. That is nothing.

I guess you lose a lot more uptime due to systemd shenanigans.

Peter Eriksson · Jan 16, 2020

recluce said:
Effective uptime: A slow server under BSD might take a minute or two to come up. If that server is rebooted once every three months (which I feel is a reasonable average), this means a maximum of eight minutes a year - or a difference of 0.0016% in uptime. That is nothing.

I guess you lose a lot more uptime due to systemd shenanigans.

Try 1 hour boot times... When you have big FreeBSD servers with (ten)thousands (and snapshots on top of that) of ZFS filesystems boot time (and let's not talk about shutdown time for that matter) takes a long time.

We have modified our startup scripts so that we do some things in the background (we make sure root/important filesystems are mounted first, then the rest is done in the background - and then some other things also have to be done in the background (that needs the rest of the filesystems to be mounted). Having to wait (up to - things have become better with the "zfs parallell mount" support in later FreeBSD versions) an hour for a server to fully boot before getting a login prompt is not so nice... But it's still a hack that really could use better "built in" parallell/dependency support.

The parallell-with-dependency handling startup (and auto-restarting of services) stuff is one of the biggest things I miss the most from our Solaris servers.

Vadim_Mkk · Jan 16, 2020

I heard that boot

Peter Eriksson said:
Try 1 hour boot times... When you have big FreeBSD servers with (ten)thousands (and snapshots on top of that) of ZFS filesystems boot time (and let's not talk about shutdown time for that matter) takes a long time.

How I heard systemd doesn't radicaly burst the boot time Linux server with thousands ZFS dataset and snapshot.

Crivens · Jan 16, 2020

Peter Eriksson Dude, what is that thing for? That sounds massive, but I'd wager that boot times would be better with faster IO and that stuff like systemd will not really help.

msplsh · Jan 16, 2020

The post literally describes how an init-hacked parallel boot and delayed mounting solves some their problem, so yes, system level support WOULD help.

recluce · Jan 16, 2020

Peter Eriksson said:
Try 1 hour boot times... When you have big FreeBSD servers with (ten)thousands (and snapshots on top of that) of ZFS filesystems boot time (and let's not talk about shutdown time for that matter) takes a long time.

With 1 hour boot times, I would lean towards the theory that your system design could be improved....

kpedersen · Jan 16, 2020

recluce said:
With 1 hour boot times, I would lean towards the theory that your system design could be improved....

Buy another machine and split the workload?
30 minutes boot time!

Phishfry · Jan 16, 2020

My 24 drive array only takes 120 seconds. An hour does sound extreme.
Question is what exactly could you do with the 1 hour if you had parallel startup processes?
On my fileserver there is nothing I could do until the arrays come up.
So I fail to see the advantage of a storage server coming up without the storage..

mark_j · Jan 18, 2020

Peter Eriksson said:
Try 1 hour boot times... When you have big FreeBSD servers with (ten)thousands (and snapshots on top of that) of ZFS filesystems boot time (and let's not talk about shutdown time for that matter) takes a long time.

We have modified our startup scripts so that we do some things in the background (we make sure root/important filesystems are mounted first, then the rest is done in the background - and then some other things also have to be done in the background (that needs the rest of the filesystems to be mounted). Having to wait (up to - things have become better with the "zfs parallell mount" support in later FreeBSD versions) an hour for a server to fully boot before getting a login prompt is not so nice... But it's still a hack that really could use better "built in" parallell/dependency support.

The parallell-with-dependency handling startup (and auto-restarting of services) stuff is one of the biggest things I miss the most from our Solaris servers.

Two points:
1. That's your design problem.
2. It's not an init problem.

Look at https://svnweb.freebsd.org/base?view=revision&revision=344569

Peter Eriksson · Jan 18, 2020

kpedersen said:
Buy another machine and split the workload?
30 minutes boot time!

It's already split over 10 servers...

, and with the parallell mount support in zfs now the boot times are down to around 10 minutes (unless it was an unclean shutdown and the ZFS filesystems happen to have "unflushed" changes (forced reboot while destroying snapshots and/or lokts of files/directories that needs to be handled - something ZFS does synchronously at boot/mount time - so it might look like a server is "hung" for a very long time (have seen a case where it took like 10 hours).

The thing with ZFS is that it isn't the number of drives that is the problem - it's the number of filesystems (we have one filesystem per user, and so some 20k filesystems per server - and then hourly and daily snapshots)... But it's getting better

Phishfry said:
My 24 drive array only takes 120 seconds. An hour does sound extreme.
Question is what exactly could you do with the 1 hour if you had parallel startup processes?
On my fileserver there is nothing I could do until the arrays come up.
So I fail to see the advantage of a storage server coming up without the storage..
View attachment 7417

A couple of things:

1. It's really nice to be able to login to the server (get a "login:" prompt, via console or SSH) so you can debug _why_ it's taking so long to start... (This was the first reason why we do the "zfs mount -a ; zfs share -a" command in the background - it was really frustrating to have to wait for everything to be mounted and started before being able to login (as root).

2. We start the Samba (SMB) server before all the filesystems are mounted so that the users that have their home directories mounted can access them as soon as they are there - no need to have to wait for all the users homes to be mounted before giving access.

3. Same as with NFS shares - export them as soon as they are available.... But the NFS and the Samba startup could be done in parallell (now Samba starts up pretty quickly, it's just the enumeration of our Window AD with some 120k users that takes some time there). And NFS sharing is much quicker nowadays too.

But still it would be nice if one could specify dependency graphs and have some things start up in parallell (where it can).

Peter Eriksson · Jan 18, 2020

Crivens said:
Peter Eriksson Dude, what is that thing for? That sounds massive, but I'd wager that boot times would be better with faster IO and that stuff like systemd will not really help.

University file servers for users, groups and research & teaching stuff. Around 1000TB of data right now.

But yeah, Systemd (or some other "parallell" alternative) doesn't really help _that_ much in day-to-day work since the server normally aren't rebooted. It's just when the shit has hit the fan that it would be nice (we've "solved" it ourself on our servers by starting some services in the background via some hacks in the /etc/rc.d startup scripts).
There are actually two things I "miss" from the Solaris days:

1. The parallell startup of services at boot time

2. That the system can "reheal" itself (automatically detects if a service dies and tries to restart it - and since it knows which services depends on the failed one - can automatically restart/refresh those too)

But to be clear: I don't like the bloatedness of systemd, would much prefer something more "clean"...

mark_j · Jan 18, 2020

Peter Eriksson said:
2. That the system can "reheal" itself (automatically detects if a service dies and tries to restart it - and since it knows which services depends on the failed one - can automatically restart/refresh those too)

But to be clear: I don't like the bloatedness of systemd, would much prefer something more "clean"...

This can be accomplished with perp (/usr/ports/sysutils/perp/) or daemontools (/usr/ports/sysutils/daemontools/ ), if you need.

Crivens · Jan 19, 2020

Peter Eriksson said:
University file servers for users, groups and research & teaching stuff. Around 1000TB of data right now.

...

But to be clear: I don't like the bloatedness of systemd, would much prefer something more "clean"...

Spoiled students...

But you may have a look at 'minit' from fefe, it's said to do all that but also that looking at the code will make you suicidal.

aht0 · Jan 20, 2020

sidetone said:
There are arguments that Linux will mess things up, and how Linus' opinions are dubious, counter to that Linus rejects ZFS. He's kind of doing FreeBSD a favor.

Yeah, as long as the status of ZFS in Linux remains questionable (kernel devs hostile to it, out-of-tree etc), there is no danger of FreeBSD becoming 'second class citizen' in OpenZFS.

Linus Torvalds Doesn't Recommend Using ZFS On Linux

ralphbsz

Eric A. Borisch

Phishfry

mark_j

CraigHB

teo

Vadim_Mkk

Linus Torvalds Net Worth

Linus Torvalds: The King of Geeks (And Dad of 3)

Linus Torvalds Net Worth - Net Worth Post

rigoletto@

brd@

Administrator

Hakaba

msplsh

Comparison of init systems - Gentoo wiki

recluce

Peter Eriksson

Vadim_Mkk

Crivens

Administrator

msplsh

recluce

kpedersen

Phishfry

mark_j

Peter Eriksson

Peter Eriksson

mark_j

Crivens

Administrator

aht0