The Case for Rust (in the base system)

I will give an example where C fell short for me. I recently worked on a program where a binary search would have been beneficial. A binary search is a simple algorithm that I could probably implement correctly on first try, it is just a few lines. But still, it is a mechanism that should have unit tests and the program doesn't have them (yet). The consequence of screwing this up wouldn't just be a non-working search, it would be memory corruption. C doesn't have array bounds checking. In a different language I could have a type-safe generic implementation and just include it. But in this C program I left a linear search for now pending unit tests.
Was bsearch(3) not good enough? And you can probably wrap a macro around it to make easier to use & less error pron and more “generic”. I understand your larger point but this is why there is no point in redoing basic algorithms again and again. I used to have my own library of such code which was quite handy when I was doing contract work back. C has many known limitations but it comes with a lot less baggage. Still, there are plenty of choices for user code.
 
There is a spectrum ranging from garbage collection to manual memory management. Reference counting is somewhere in the middle of that spectrum. I don't really understand the Rust borrow checker, but I think it's some kind of reference counting where you can have any number of read-only references, but only one writable one.

At one end of the spectrum, it's easy to write memory-safe code quickly using a garbage-collected language. The GC imposes a non-trivial overhead, but this is usually OK because hardware is so crazy overpowered nowadays. You only run into trouble at the extremes like limited hardware for small devices, or applications where milliseconds matter and you have to squeeze every little drop of performance out of your hw.

If you're in the latter category, you should take a look at manual memory management and maybe pay the price in terms of development and testing time.

Reference counting is somewhere in the middle. I like this description of it. Basically it's a lightweight GC the cost of which is shared across your entire program.

If our industry was sane, we'd trade off these factors whenever considering a new project or platform. Sadly it is not, and what we have is a silver bullet culture. Every so often there's a shiny new thing that is going to solve every problem for ever and ever amen. Rust is just the latest silver bullet. I'm sure it has its uses, and may even be a wonderful new language and runtime. I'm also sure it won't solve every problem. I hope we grow up at some point.
 
Whenever anyone wants to find faults with C, the examples given seem to always be, "I made a mistake and the language didn't catch it or fix it for me!"
I don't' want to come across as attacking anyone about this. I'm not. I understand the issues.
I'd be interested in the other side of the coin. What is one giving up or having to fight by using Rust versus C?
 
The GC imposes a non-trivial overhead, but this is usually OK because hardware is so crazy overpowered nowadays.
And the overhead of GC can be minimized (to near zero) in the common case, namely a variable that is not shared (what Rust calls "borrowed"), but only constructed, used, and then destructed; in such a way that the compiler can see the life cycle of the variable. In modern C++, that's the use case of unique_ptr.

Reference counting is somewhere in the middle. I like this description of it. Basically it's a lightweight GC the cost of which is shared across your entire program.
And sadly, pure reference counting does not work, it leads to memory leaks. Classic example: You create two data structures A and B, and hold references to them (in local variables). Now you cross-link them, so A has a reference (pointer) to B, and B a reference to A. Finally, you release your two local references to the two variables (for example when the variables go out of scope). A and B will never be destroyed, because they have a ref count of 1. Even in modern C++ using shared_ptr, this memory leak exists.

So in my (not at all humble) opinion, there still is no perfect solution to memory management. It's always tradeoffs. The traditional C way (rely on the programmer to be a super-genius who never makes mistakes) works in certain areas, but does not scale to the population at large. Because of C++'s origin as "C with classes", it has never been able to completely close those gaps; but at least in modern C++ we have idioms that are rich enough so that we can prevent memory management problems (stale pointers, leaks) if we apply enough bondage and discipline. It's not the language itself that solves the problem, it is the conventions built on top of it: coding rules, style guides, reviews, unit tests, and autopsies after crashes (valgrind is in that category). At the other extreme, languages such as Java and Python solve the memory management problem, but that solution comes at a cost. The cost of CPU cycles spent on GC is minor; the real cost is forcing the programmer into a certain way of doing things. And then there are various new languages, such as Go, Rust, Kotlin, Carbon, none of which have stood the test of writing large systems of system programs in them.

If our industry was sane, we'd trade off these factors whenever considering a new project or platform. Sadly it is not, and what we have is a silver bullet culture.
Actually, I contend that our industry is very very sane. Whenever a big (commercial, not hobbyist) project starts, it evaluates what tools to use. And it picks the optimal one, looking at tradeoffs: What is out biggest bottleneck? Programmer skill? Wall clock time of development? CPU cycles? Reliability once deployed? A language that's designed for readable and long-term maintainable and enhancable code? Different projects find different answers, because their tradeoffs are so radically different. Sometimes, they make a mistake and settle on the wrong answer; c'est la vie. Sometimes the answer they find is painful, for example "we'd really like to code in Y because it will save CPU power in the long run, but we need to use X because we don't have enough manpower to rewrite existing libraries that are in X".

My personal answer has been (for the last ~5 years) to do MOST of my coding in Python and SQL, using C++ only where necessary for performance and to interoperate with existing code. Your mileage WILL vary.

Every so often there's a shiny new thing that is going to solve every problem for ever and ever amen. Rust is just the latest silver bullet.
And now we're switching from good engineering practice to psychology, sociology, and mass hysteria. While in my experience good (commercial) engineering practice looks at all available tools (which explicitly includes Rust as an option), the internet-based culture chases the "shiny object", and often prematurely declares it to be the silver bullet. We had that happen with Java about 25 years ago (and I spent 5 years of my life promoting and coding in Java): It is a very fine language, and a good engineering solution to many problems. Alas it was over-hyped as the "solution to everything", and it couldn't satisfy those exaggerated expectations. Then the frustration set it. And the language and infrastructure both ossified and got more and more (too many?) features, and today it is used less than it should be.
 
At one end of the spectrum, it's easy to write memory-safe code quickly using a garbage-collected language. The GC imposes a non-trivial overhead, but this is usually OK because hardware is so crazy overpowered nowadays. You only run into trouble at the extremes like limited hardware for small devices, or applications where milliseconds matter and you have to squeeze every little drop of performance out of your hw.

If you're in the latter category, you should take a look at manual memory management and maybe pay the price in terms of development and testing time.

Reference counting is somewhere in the middle. I like this description of it. Basically it's a lightweight GC the cost of which is shared across your entire program.

If our industry was sane, we'd trade off these factors whenever considering a new project or platform. Sadly it is not, and what we have is a silver bullet culture. Every so often there's a shiny new thing that is going to solve every problem for ever and ever amen. Rust is just the latest silver bullet. I'm sure it has its uses, and may even be a wonderful new language and runtime. I'm also sure it won't solve every problem. I hope we grow up at some point.

(let's assume that I talk about mark-and-sweep GC below)

Garbage collection usually has pauses, but that doesn't mean that the overall CPU time is more. malloc/free has many problems and tradeoffs, namely fragmentation and lack of compactation, which a mark-and-sweep GC avoids. malloc/free can build up significant fragmentation over time for long-running processes.

A GC'ed system can also be much faster in coming up with the garbage in the first place. Memory allocation in SBCL is an atomic increment of a pointer, versus at least a full function call in malloc, plus search for a good place. This can make a huge difference if - for example - you have a query based system and can do GC between queries. So the query latency is minimized significantly below a malloc/free system.

Compactation matters, too. After mark-and-sweep you have all memory in tight blocks with no holes. That is usually good for CPU cache and TLB hit rates.

There also is pauseless GC, but that is hard to integrate with the other properties I mentioned. There is no silver bullet.

In the end high performance code needs to avoid stirring the heap as much as possible, and that leads to many of the problems we are seeing, via preallocated+reused buffers, stack usage and so forth.
 
My 2c for GC: It should be beneficial.
Reasoning: You can have multithreaded GC, leaving the runtime of even free() in a different thread. And today, we have more cores but the speed of a core does not get any better I fear.
 
  • Like
Reactions: mer
...
And sadly, pure reference counting does not work, it leads to memory leaks. Classic example: You create two data structures A and B, and hold references to them (in local variables). Now you cross-link them, so A has a reference (pointer) to B, and B a reference to A. Finally, you release your two local references to the two variables (for example when the variables go out of scope). A and B will never be destroyed, because they have a ref count of 1. Even in modern C++ using shared_ptr, this memory leak exists....
Yes. And that's why C++ has weak_ptr to solve such problems.
 
The reference counting pointers in new-ish C++ are also pretty expensive, including weak pointers. Free-floating plain C pointers are much faster.
 
The reference counting pointers in new-ish C++ are also pretty expensive, including weak pointers. Free-floating plain C pointers are much faster.
Indeed. Plus this kind of "observer pattern" is overkill for ~99% of cases. I use a system where if something like a weak_ptr gets invalidated, rather than resetting to NULL, it instead causes a panic and the program aborts. This way in the debug builds I can ensure the code is more likely to be correct and then strip away all of the overhead in the release builds so that the performance is similar to unique_ptr + raw observers.

It is a good compromise. My C++ programs tend to feel tighter as a result. The only issue with it is that it is a niche approach so would not really consider it to write libraries and middleware.
 
Probably literally the entire reason why anyone is even considering Rust. Not because they are not fast, but because people choose them over being safe.

There also is the issue of abusing the stack in an unsafe manner (to avoid stirring the heap). My dirty little secret is that I can easily write C code faster than SBCL's Lisp code because of more stack usage. It really matters performance-wise, but if you run into problems with unsafe stack usage it is a royal pain to debug.
 
Let me tell you a little tale.
There once was a discussion on the fpc mailing list about replacing the memory allocator of the compiler for speed reasons. I did, as a test, do a dumb replacement that had a big pool and only returned slices from that, never freed. Fastest way to allocate, only two operations. There was no real speed up. Then I measured a bit deeper and came to the result that in a self compile, about 2% of the runtime were in the original memory management. And cutting that out had put more work on the system page allocator, eating that up again.

TL/DR, or as was said before "the morale of the story" - benchmark and measure. Get hard facts before declaring a feeling a problem. And remember Amdahl's law.
Often, that neat trick of yours, is not doing anything. Better algorithms are far better.
And when I read about the build times for rust, and the memory requirements, I can only think "someone does not have his house in order. And I am expected to trust the result is efficient?"
 
I think it isn't a matter of 'evolution of programming', it's matter of 'maintain professional and smart developer ground'. I don't want to offend anyone in any way, I just want to say that it seems to me that using languages that facilitate the writing of programs to avoid developers having to think about what they do, doesn't seem like a good idea, especially for an operating system. I could understand this for programs (generic software) where the costs of time (speed) and memory are less important, whereas in an operating system the important things are time (speed), memory and care in writing the code.
Having said that, I don't know Rust and other languages born in the last decades, but I believe, also from my past experiences, that languages with fewer abstractions are more suitable for talking to the machine and the machine understands it much better and in a shorter time, of course you need to know that language well and master it. The C language has probably had its day, but I believe it is still the most efficient language for an operating system in terms of time (speed) and memory. Times change, so do people, but machines still speak their language.
 
In my opinion on what I already did with Rust, small stuff just to try this and that, I concluded that Rust is a framework, a development framework to be more precise.
And just like any development framework there are pros and cons and during my testing even on the small things I did I hit most of times in library dependencies issues.
Spent more time resolving those issues than writing the code itself.
I remember one situation I was trying to use the system locale to determine the currency format to display the symbol and number format on a price.
Some libs are not working in FreeBSD, etc but I was able to achieve what I wanted.
So there will be a day when libs get deprecated and replaced and your code will no longer compile and I hate that feeling of someday I will have to look at the code I wrote some years ago to fix something that was working just because I need to add something more to it.
 
There is a spectrum ranging from garbage collection to manual memory management. Reference counting is somewhere in the middle of that spectrum. I don't really understand the Rust borrow checker, but I think it's some kind of reference counting where you can have any number of read-only references, but only one writable one.

At one end of the spectrum, it's easy to write memory-safe code quickly using a garbage-collected language. The GC imposes a non-trivial overhead, but this is usually OK because hardware is so crazy overpowered nowadays. You only run into trouble at the extremes like limited hardware for small devices, or applications where milliseconds matter and you have to squeeze every little drop of performance out of your hw.

If you're in the latter category, you should take a look at manual memory management and maybe pay the price in terms of development and testing time.

Reference counting is somewhere in the middle. I like this description of it. Basically it's a lightweight GC the cost of which is shared across your entire program.

If our industry was sane, we'd trade off these factors whenever considering a new project or platform. Sadly it is not, and what we have is a silver bullet culture. Every so often there's a shiny new thing that is going to solve every problem for ever and ever amen. Rust is just the latest silver bullet. I'm sure it has its uses, and may even be a wonderful new language and runtime. I'm also sure it won't solve every problem. I hope we grow up at some point.
Rust doesn't use reference counting by default, unless you use Arc<T> or Rc<T>. Borrow checker just ensures there's only one mutable borrow or multiple immutable borrows(the immutable borrow and the mutable borrow are mutually exclusive.) and the borrow lifetime won't be longer than its lender's lifetime. Lifetime is the scope of value, in Rust, lifetime is a part of the type, it is represented by generics.
 
A pox on all your rusts!

According to the register, all the rust developers are burnt out and leaving the project:-


My vote goes to C, anyway. Back in 1997 'they' were telling me that C was 'legacy' and that everything thereafter was going to be written in Java, you wouldn't need to understand pointers because the garbage collector would do it all for you. Just like they're saying about rust now.
 
Rust doesn't use reference counting by default, unless you use Arc<T> or Rc<T>.
Anything with a node / graph structure will require this (i.e XML, DOM). Likewise anything with the observer pattern will (Cache, Pool, most types of aggregation). So ultimately it will be present in pretty much any non-trivial codebase. Same as C++ really.

In many ways I actually enjoy watching people attempt to make games in Rust and realizing that much of the borrow checking benefits are kind of minimal and they begrudgingly always have to fall back to good ol' ref counting. The tried and tested ECS design is basically out but there might still be some solution in the DOD design, even if it is much more fiddly to use.
 
Anything with a node / graph structure will require this (i.e XML, DOM). Likewise anything with the observer pattern will (Cache, Pool, most types of aggregation). So ultimately it will be present in pretty much any non-trivial codebase. Same as C++ really.

In many ways I actually enjoy watching people attempt to make games in Rust and realizing that much of the borrow checking benefits are kind of minimal and they begrudgingly always have to fall back to good ol' ref counting. The tried and tested ECS design is basically out but there might still be some solution in the DOD design, even if it is much more fiddly to use.
So there's tracing gc like https://kyju.org/blog/rust-safe-garbage-collection/#putting-it-all-together
It is triggered mannually so that you can control the pause.
 
There are also garbage collectors which do a COW mapping of your heap +stack and then use a seperate thread to mark garbage, which is then freed in the original mapping. You have minimal runtime costs in the main thread(s), only the COW faults will slow things down. This is when GC makes sense, let some other CPU core do it and save the runtime in the worker threads. We can more cores much easier than faster ones.
 
I feel a garbage collector could help to loosen cyclic references as a fallback but I am still not sold on it.

Unreal Engine's C++ is well known to use a garbage collector and I am not convinced it was the best decision.

(With C or C++. I think a language that has it built in can do it portably and effectively. Go-lang possibly is a recent example of this).
 
I feel a garbage collector could help to loosen cyclic references as a fallback but I am still not sold on it.

Unreal Engine's C++ is well known to use a garbage collector and I am not convinced it was the best decision.

(With C or C++. I think a language that has it built in can do it portably and effectively. Go possibly is a recent example of this).

The problem with garbage collecting C and C++ is that objects are not allowed to move. So there is no compaction for better cache and TLB hit rates and you still have to fill fragmentation holes, Worst of all worlds, really.
 
I feel a garbage collector could help to loosen cyclic references as a fallback but I am still not sold on it.

Unreal Engine's C++ is well known to use a garbage collector and I am not convinced it was the best decision.

(With C or C++. I think a language that has it built in can do it portably and effectively. Go-lang possibly is a recent example of this).
This is something similar to what you said: https://github.com/chc4/samsara
 
Back
Top