The Case for Rust (in the base system)

kpedersen · Sep 6, 2024

noise said:
If they change, things break in every place you forgot to adjust. Rust or not. It's actually a shortcoming of C.

You can complain about C as much as you want. But you still need to work around it.

noise said:
One Rust guy is leaving. In Linux. Because the gatekeeper there refuses to cooperate with them, not because they refuse to do additional work or maintenance.

The problem is that for this to work, you need someone who is crazy passionate about Rust *and* C. This doesn't seem to exist.

noise said:
The objects in question are not private in a monolithic kernel, that's part of the current design. They have to be shared, and that requires an interface.

It doesn't require an interface. Which is why it was rejected.

noise said:
That's the POSIX interface, not what has to be used to implement the kernel.

Thats true and I was responding to your more general comment about "C developers" by that point who don't have the luxury of -std=gnuXX extensions.

noise said:
Linux and FreeBSD make up a lot with reviews and testing, but code quality is definitely subpar if I compare it to typical embedded development today. Which is like the only remaining domain of C.

You are saying that FreeBSD and Linux has subpar code quality compared to the typical embedded offerings (with the majority coming from India/China with maximum cost savings)? Haha, I dare you to post that on one of the respective mailing lists!

noise said:
On a side note, a majority of the embedded projects I see are in C++ now, for such "trendy" features like type safety and const.

The majority of embedded devices don't even have a C++ compiler available from the vendor. I feel your view might be a little limited here.

BaronBS · Sep 6, 2024

Zig is the future, rust is just a deviation in our path.
Zig can interact with C, C++ and Rust. Even if we don't use Zig code, it would be very advantageous to use it as a compiler framework because how easy it is to integrate C code with C++, Zig it self and Rust code.

msplsh · Sep 6, 2024

kpedersen said:
You can complain about C as much as you want. But you still need to work around it.

Ah, so the response to "can we make a proper interface out of something that is not formally defined, but implemented in C" is "work around it." Excellent work preventing anything else, including Rust, from working with the kernel. Truly, a clever bit to make sure C never goes away. /s

noise · Sep 6, 2024

kpedersen said:
You can complain about C as much as you want. But you still need to work around it.

I don't complain about C, it does its job. But the babysitting you complain about is a consequence of C, not Rust.

kpedersen said:
The problem is that for this to work, you need someone who is crazy passionate about Rust *and* C. This doesn't seem to exist.

Or maybe much less passion and a realistic, open-minded view on both languages. Discussion on FreeBSD mailing list wasn't that bad, actually.

kpedersen said:
It doesn't require an interface. Which is why it was rejected.

It is an interface. It was rejected because "you won't make us all learn Rust".

kpedersen said:
You are saying that FreeBSD and Linux has subpar code quality compared to the typical embedded offerings (with the majority coming from India/China with maximum cost savings)? Haha, I dare you to post that on one of the respective mailing lists!

Not talking about crapware from non-professionals around the globe, they don't care about anything except sales. The industry standard in quality is where they care about reliability, e.g. controllers for production machines, cars, medicinal devices etc.

kpedersen said:
The majority of embedded devices don't even have a C++ compiler available from the vendor. I feel your view might be a little limited here.

That depends on how "embedded" they are. This was actually an observation on the local job offers around here.

astyle · Sep 6, 2024

drhowarddrfine said:
So there. That's settled. We're going to use Zig.

I thought that was a sarcastic joke, ppl?

blackbird9 · Sep 6, 2024

BaronBS said:
Zig is the future, rust is just a deviation in our path.
Zig can interact with C, C++ and Rust. Even if we don't use Zig code, it would be very advantageous to use it as a compiler framework because how easy it is to integrate C code with C++, Zig it self and Rust code.

what about Nim?

drhowarddrfine · Sep 6, 2024

There are 106 languages FreeBSD could be re-written with.

blackbird9 said:
what about Nim?

Nim! 107!!

blackbird9 · Sep 6, 2024

There is no manifold.... sanctuary, I mean sanctuary! Ummmm... you know what I'm trying to say...!
Unix is my name, and C is my language...

View: https://www.youtube.com/watch?v=Li_AlSSqG9w

patmaddox · Sep 7, 2024

I don’t get the fuss over maintaining stable interfaces. Isn’t that what man sections 2, 3, and 9 are? So it’s already a largely-realized ideal.

No C programmer would ever say “I changed the signature in the function declaration - but I’m not touching the declaration (aka header file)!!!”

Similarly, if you do change an interface, it’s your responsibility to make sure you don’t break other stuff - and if you’re not sure how to verify it or fix it, you consult with the people who would be affected.

Maintaining bindings isn’t fun, I get it - which is why I think Zig and Go have an advantage when it comes to interfacing with C. But a binding is essentially a header file implemented in a different language. So let’s not pretend like it’s some insurmountable issue.

If I had to guess, I’d say that the pursuit of Rust bindings will reveal ownership issues in the C code, and that addressing them would lead to some interface churn. But it’s just a guess. We won’t know until somebody takes a stab at (re-)implementing something for real, on their own time, without foresight of whether the result will be acceptable to the rest of the developer community.

As of now, there’s lots of arguing from a position of conjecture, and so far nobody that I can see sharing practical experience. But I’m sure in time someone will get tired of the mailing list and write some code and then we’ll really have something to debate.

T-Aoki · Sep 7, 2024

patmaddox said:
I don’t get the fuss over maintaining stable interfaces. Isn’t that what man sections 2, 3, and 9 are? So it’s already a largely-realized ideal.

Unfortunately, these are not sufficient.
Yes, these would be sufficient with API and KPI.

But as these describes basically about source code level of things only.
What we need on transition (reimplementations are ongoing) phase are ABI and KBI compatibility. Binary code level.

Imagine that the same source codes are built on amd64 and aarch64 natively. The resulting codes by another arch shouldn't run without CPU-level emulator.

Imagine how arguments are handled on function calls for relatively simple cases. For C, arguments are pushed from right to left to stack. But for Pascal, it's in reverse. For wider-than-a-byte arguments, endianness also affects in-memory layouts.
And for archs without stack, allocating memory region for arguments, store arguments there in required layout, and passing the region's address woule be needed (or use a plenty of registers instead?).

Of course, these (except how arguments are pushed) doesn't directly fully apply withh C vs Rust case as it is arch specific, but now you'd be able to imagine that considering source code level only is insufficient.

For sane transition when old and new languages coexists, both codes must be linked sanely. This is why I repeatedly state Rust shall use C calling convention (so called cdecl) at least in the meantime.

kpedersen · Sep 7, 2024

patmaddox said:
I don’t get the fuss over maintaining stable interfaces. Isn’t that what man sections 2, 3, and 9 are? So it’s already a largely-realized ideal.

The issue is that Rust bindings needs access to more than the current stable interfaces. The kernel developers don't want to introduce them and hence they are rejected. No fuss on their side really.

Espionage724 · Sep 7, 2024

I don't program, but I'm interested in C++ and potentially C#. I feel anything can be done with C++, safely and unsafely.

I know nothing about Rust, but it sounds like it's trying to replace C++ on the grounds of being safer. I don't like that. It sounds like safety features in Rust would block potential performance improvements, or require slower workarounds. If I'm going low-level, I want performance, and the ability to make whatever coding decisions I want to get it done. It's the OS's and hardware/driver's job to prevent anything too-wacky

On the other hand if Rust is lower-level or offers more performance than C++ can, I'd be interested to hear all about that!

noise · Sep 7, 2024

Espionage724 said:
I don't program, but I'm interested in C++ and potentially C#. I feel anything can be done with C++, safely and unsafely.

I know nothing about Rust, but it sounds like it's trying to replace C++ on the grounds of being safer. I don't like that. It sounds like safety features in Rust would block potential performance improvements, or require slower workarounds. If I'm going low-level, I want performance, and the ability to make whatever coding decisions I want to get it done. It's the OS's and hardware/driver's job to prevent anything too-wacky

On the other hand if Rust is lower-level or offers more performance than C++ can, I'd be interested to hear all about that!

For straight procedural programs (without OO), C, C++ and Rust are about the same in terms of performance. C# is about 2 times slower. C++ is king when some parts need to be precomputed or specialized at compile time, which may be crucial for performance in some (rare) cases. If you're in for the learning experience, I'd also recommend C++, as it covers a broad range of concepts from low level to highly abstracted.

Rust delivers comparable performance in general, the additional checks are done at compile time. But the compiler will hold your hand unless you explicitly tell it not to.

cracauer@ · Sep 7, 2024

noise said:
For straight procedural programs (without OO), C, C++ and Rust are about the same in terms of performance. C# is about 2 times slower. C++ is king when some parts need to be precomputed or specialized at compile time, which may be crucial for performance in some (rare) cases. If you're in for the learning experience, I'd also recommend C++, as it covers a broad range of concepts from low level to highly abstracted.

Rust delivers comparable performance in general, the additional checks are done at compile time. But the compiler will hold your hand unless you explicitly tell it not to.

What about array bounds checking and C's ability to abuse the stack in any unsafe way you like? Uninitialized variables, too, and constructor-free arrays of things.

I'm pretty sure that as long as you accept those risks you can make a program run faster in C than Rust.

Alain De Vos · Sep 7, 2024

Is this thread not hypothetical. Meaning you can speak about memory-safety, but there is not one freebsd kernel module written in rust.

kpedersen · Sep 7, 2024

cracauer@ said:
What about array bounds checking and C's ability to abuse the stack in any unsafe way you like? Uninitialized variables, too, and constructor-free arrays of things.

I'm pretty sure that as long as you accept those risks you can make a program run faster in C than Rust.

You could make an "unsafe" vec class in Rust if you need the performance. And for C++, just use operator[] rather than ::at().

Once you are into "native land", you can pretty much output the same machine code in any language.

Do we need to accept those risks? Almost certainly albeit much more rarely than the C++ standards guys would like to admit (which is why that language is needlessly unsafe).

Espionage724 said:
I don't program, but I'm interested in C++ and potentially C#. I feel anything can be done with C++, safely and unsafely.

If you are interested in C++ and C#, if you have time, check out C++/clr. Its Microsoft's C++ compiler with .NET extensions. Actually a fantastic technology (annoyingly closed-source). You can emit 100% .NET VM code via cl/clr:safe. Plus you can benefit directly from native C libraries and .NET using * or ^ for managed pointers respectively without requiring language bindings. It is a great example of language evolution rather than revolution.

patmaddox · Sep 8, 2024

kpedersen said:
The issue is that Rust bindings needs access to more than the current stable interfaces. The kernel developers don't want to introduce them and hence they are rejected. No fuss on their side really.

I understand that. My point is that they've already recognized the value of having stable interfaces. It seems it may be worthwhile to stabilize other parts of the interface as well.

T-Aoki · Sep 8, 2024

patmaddox said:
I understand that. My point is that they've already recognized the value of having stable interfaces. It seems it may be worthwhile to stabilize other parts of the interface as well.

So it would be better to wait for the stabilization and see whether it's acceptable or not once it actually happened.
The best would be that it is done as de-jure standard like ISO/IEC or IEEE.

noise · Sep 9, 2024

cracauer@ said:
What about array bounds checking and C's ability to abuse the stack in any unsafe way you like? Uninitialized variables, too, and constructor-free arrays of things.

Yes, you'd think that these things affect performance, but it seems not to materialize in real world tests. In all the tests I've seen, the results varied within a low single digit percentage, and there was no clear winner when solving different test problems. IIRC, these tests were done with more or less idiomatic code, with some basic level of robustness and portability.
Which probably means C had small translation units and thus less inlining, C++ sometimes bloats, and Rust performed some unnecessary runtime checks.

BTW, could you give an example of "abuse the stack" to gain performance?

cracauer@ said:
I'm pretty sure that as long as you accept those risks you can make a program run faster in C than Rust.

I wouldn't bet on that. A large part of optimization happens in the compiler and cannot be expressed in any of the languages. If a language can give more meta-information about the program's intentions and scope, that helps the compiler optimize further. And C clearly has a disadvantage in that department.

Alain De Vos said:
Is this thread not hypothetical. Meaning you can speak about memory-safety, but there is not one freebsd kernel module written in rust.

The debate is still about whether it's even worth it to explore Rust in base and kernel. So it's hypothetical by definition.

cracauer@ · Sep 9, 2024

With "abuse the stack" I mean allocating a lot of things on the stack instead of the heap. That is a major performance advantage, but unsafe if any values escape from the current function. In languages like Python you only have heap allocation. In languages like Go you have stack allocation only if the compiler can prove that there is no escape, which is a very difficult decision and often your won't stack-allocate in Go even if it is safe (because it can't be proven). In Lisp you promise the compiler explicitly that a variable (or parts thereof) won't leak. If you made an untrue promise you get a hard to debug memory error.

I actually don't know what Rust's stack allocation policy is, exactly, but obviously it cannot call itself a memory safe language if you can just do whatever you want wrt stack vs heap. If somebody could shed some light that would be great.

mer · Sep 9, 2024

array bounds checking: a compiler can look for someone declared foo as int32_t[10] and someone tried to access foo[11] or foo[-1]. It can't check for a computation for "index" and accessing foo[index]. If all the code calculating index does bounds checking you'd never see array bounds errors on foox[index] because other code has done the work. Perhaps that's why real world testing shows little difference.

Stable interfaces are always a good thing when you are trying to utilize things across that interface.

ralphbsz · Sep 9, 2024

Today, a major driver of performance (on the heap and on the stack) is the size of the data. Turns out CPUs spend a significant fraction of their cycles waiting for memory reads. The stack tends to be very local, and stays in L1 and L2 cache, so accesses are fast. With data on the heap, it all depends on working set and locality, what fraction of it can be kept in caches. Caches are by their nature small.

So how much data can really be cached then depends crucially on how big the data is. And here languages and programming idioms that make the data have a larger footprint can help or hurt. For example, in Python any variable (if I say n=3 and s="foo") is very large, and occupies several dozen bytes, for management overhead like dictionaries and ref counts. Replacing that with a 4-byte integer and a 4-byte native string makes the cache much more effective, and can make the code an order of magnitude faster. That's why in production Python code you sometimes have to take the inner loops are recode them in C: it's less about the CPU cycles of interpreting, more about memory footprint.

Similar effects happen in C++. If you have to integer variables that always travel together, the temptation in C++ is to turn them into an object. If the object can have different semantics, inheritance and virtual functions make sense. The two integers now occupy 16 or 24 bytes, and use up that much stack or cache anytime one is used. But on the other extreme, if you know ahead of time that one of the variables will go only from 0 to 1000, and the other no bigger than a quarter million but sometimes negative, you can instead pack them together into a single 32-bit integer, access then with functions that mask and shift (and will be inlined), and handle the "inheritance" by adding a flag bit into the variable. You just reduced the usage to 4 bytes, and your code is likely to be lightyears faster. In particular if you have large arrays of this "pair of smallish integers": at 4 bytes each, many million of them will fit in L3 cache, at 16 or 24 bytes it gets tight. The cost of doing this (pretty extreme) idiom is that all accesses to these integers will have to go through complicated-looking functions.

In summary, if you want good performance, pick a language and programming style to make your data compact. After this long preamble, now the programming language discussion: In theory, Rust has to use tag fields on many variables, making them bigger in memory. That in and of itself might make Rust an effectively slower language. I don't know how many of those the compiler can elide if it knows all accesses to be safe.

cracauer@ · Sep 9, 2024

When I had a language where you could turn array bounds checking on and off I measured 2.5% difference for the whole application. That is very significant in higher performance computing.

I left it on anyway (and it probably still is in my old company) because the errors were just too annoying to debug. Management was OK with the performance hit (not that I asked too closely).

noise · Sep 9, 2024

cracauer@ said:
With "abuse the stack" I mean allocating a lot of things on the stack instead of the heap. That is a major performance advantage, but unsafe if any values escape from the current function.

Hah, got me - I expected something far more wild and reckless than standard stack allocation...

cracauer@ said:
I actually don't know what Rust's stack allocation policy is, exactly, but obviously it cannot call itself a memory safe language if you can just do whatever you want wrt stack vs heap. If somebody could shed some light that would be great.

Rust allocates on the stack, unless you explicitly put something on the heap using Box. Managing references to stack allocated data in a safe way, to use it further down the stack, is one of the key features of Rust. Compared to C, the only meaningful restriction I can think of is variable length arrays (on the stack), which is not available in Rust (and still controversial in C).

Regarding array bounds checking, I suppose function inlining and the branch predictor on a modern CPU can eliminate a good portion of the potential overhead. Depends on the application, though.

ralphbsz · Sep 9, 2024

And that is a great example of a general issue: In most cases, the limited resource is not CPU cycles, but brain time. Trading 2.5% CPU performance for lots of risk and significant (and unknown) debugging time will usually be a great choice, except for HPC and the back end of cloud.

An example: When I worked at one of the giant computer companies (where my code often ran on tens of thousands of CPUs), my code was written in Python. I know that recoding it in C++ would have made it 100x faster, but that would have cost half a year of engineer time, and forever significantly more maintenance cost. Moving it to GPUs would have been even faster, but fundamentally unmaintainable. Half a year of engineer time is a lot, compared to weeks of processor time, because computers are cheap. And I knew that the important parts (the distributed database accesses for example) were automatically being farmed out to highly optimized code that runs on lots of CPUs, so my slow Python was not the bottleneck.

And in many cases, array bounds checking (and many similar checks, like refcounts) can be elided by the compiler. For example, if you write this code:

Code:

int a[20];
for (int i=3; i<17; ++i)
    a[i] = ...;

the compiler doesn't have check that i is within range, because it's obvious. And even if it had to check (for example because the bounds 3 and 17 were passed in as arguments), common subexpression elimination might have moved the check before the loop, and it's a very fast register comparison. So trying to optimize by removing checks is nearly always false economy.