Stability -- what it really means


Hi all,

I've seen quite a few threads now where the "stability" of an OS, a system configuration, an application etc. is discussed and I think most of the time, what people really mean is reliability and availability.

In my view, stability of a software product refers to the amount and frequence of "breaking changes". An update or security hotfix that has architecture implications, so it breaks e.g. a library interface, a configuration file format, a network protocol, an OS ABI/API, etc -- anything that makes work on other components necessary -- reduces stability. As a consequence, to reach stability, you have to put quite some thought in designing your interfaces and integrating new stuff without breaking them. FreeBSD does a very good job at this, much better than e.g. Linux (and IMHO even a bit better than Windows).

What many people refer to is how long their system and/or application is running without problems or crashes, how much load it handles, etc. This is another dimension of quality and depends mostly on the implementation. Fewer bugs, better algorithms, etc mean more reliability and more availability. It's quite important, but I wouldn't call ist "stability", to avoid confusion. What do you think?

BR, Felix
I had a much longer answer ready to go, but I have work to do, so I'll keep it "short." :p The "stability" of a system is very closely tied to the notion of equilibrium. Systems, whether artificial or natural, are by definition in a constantly active state, and the ability of individual components to adjust counts for a lot. A building is stable so long as its structure can adjust to stresses from its contents or the local environment. An ecosystem is stable so long as the majority of the species in it can continue to thrive in the face of selective pressures. Factory machinery is stable as long as it can reliably function 24 hours a day, 5 days a week while needing only routine maintenance. A hospital ward is stable as long as the equipment and staff can efficiently and effectively treat many different patients with different ailments. In all cases, things change as time passes, with the occasional catastrophe presenting greater obstacles, and everything in the system changes a bit in order to maintain equilibrium.

Now, how static or dynamic the components of a system, or external factors affecting it, happen to be play a big role in whether a system remains stable. Frequent and drastic change has a big impact on stability, as more variables are introduced and (in artificial systems maintained by people) problems are harder to identify and fix when they occur. Highly dynamic systems are inherently less stable, because their workings are less reliable and predictable. But everything needs to change over time. Just how stable a system is at any given point in time depends on how rapidly and drastically some factors change versus how ably its components can change.

So in short, while there's a distinction between how stable a system is and how static it is, they're intertwined concepts. When talking about operating systems, frequent, drastic changes in the software ecosystem can make individual applications/services less stable, because many different interdependent components are being modified before the effects of any one of those modifications can be known. This would ultimately make the entire system less stable, and more importantly, it could obfuscate that instability. A problem introduced into a component six months ago might not become apparent until many other changes to that component have been made, and then one needs to dig through months of history to try and find the real root of the problem. Avoiding that is the benefit of planning out minute, gradual changes to specific things between specific milestones in a software project.
Maybe I didn't go into enough detail myself ;)

Your comparison with other domains of course makes sense, it's all in all a logical argumentation for why people use "stability" in the sense I criticized. But first, I want to address a misunderstanding (as it seems to me): A software system being "stable" in the sense I described it doesn't mean it's "static" at all. It could even be completely rewritten from scratch and still be stable, because it doesn't change in it's outside perception. It might have added features, great improvements, numerous bugfixes, whatever -- but it will not break anything alongside, no clients using it, no configuration files, no UI frontends, etc.

I have a background as a senior dev/architect, and with that experience (and without thinking all developers or architects would agree), my argument is that differentiating between stability and reliability is much more concise and therefore helpful. I think it just makes for a better classification. Using the definition of stability I outlined above, you could state that Linux is a reliable system (most of the time), but isn't (and probably never wanted to be) a stable one.