The Case for Rust (in the base system)

rmomota · May 1, 2024

freethread said:
In my opinion, the point is another. It is better to have an army of mediocre programmers rather than 1 expert programmer, to produce thousands of mediocre applications that serve no purpose but produce more income. So a programmer doesn't need to be able to manage memory or keep track of the number of entries in an array and other similar "non-sense" things.

Nowadays is more like ChatGPT programming.
Technology will make future generations stupid.... I mean... more stupid.

msplsh · May 1, 2024

Go is a waste of an upgrade opportunity, IMO.

Crivens · May 1, 2024

shkhln said:
#defundthecompilers

Off to the dog house with you, infidel.

Zare · May 1, 2024

cracauer@ said:
Well, I dunno. You have unchecked array bounds (and no way to turn them on). You can freely return values that live on the stack from a function.

You can also #rm -rf / which is almost always a grave human error.
The point is that you won't.

C memory errors can bite me hard in a complex project. That's absolutely true and we have a ton of tools to remedy the fact, because C is a technology and technology is a compromise. It was an intentional design-induced compromise to not have safety checks in C, as it was intentional to leave 100% freedom to root account on Unix, even if the admin is about to destroy his company's data and his career.

msplsh · May 1, 2024

Have you considered that by making the errors impossible, this is a freedom from those errors that people are looking to give themselves?

Just because I could eat cookies all the time does not mean I should do so. To protect myself, I don't even keep them in the house, so I have to eat the fruit on the counter...

Zare · May 1, 2024

What is that analogy supposed to mean anyway, if you can't be around stuff that's bad for you because you're tempted, you have psyche and/or addiction problems.

ralphbsz · May 1, 2024

In the question of the boundary between safe code (for example manipulating complex data structures and variable length strings), and unsafe code (for example writing directly to memory at certain addresses, in a device driver), one argument is overlooked. I like to use an old saying from Dijkstra to explain it: "There are two kinds of programs. One is so short that it obviously has no bugs. The other is so long that it has no obvious bugs."

If one keeps the unsafe part (for example in a Rust program inside the unsafe block, or in-line assembly in C or C++) very short, then it can be written to be very clear, and it is correct "by inspection": the reader can reason about the code, they can walk through possible interaction, they can formally or informally check its correctness. While it may look unsafe, it isn't.

In contrast, large bodies of code are somewhere between hard and impossible to inspect and to prove correct. For those, we need automated tools. One of the easiest automated tools is a programming language that enforces memory safety. If used correctly, the combination of a small amount of unsafe code with a large amount of complex code can be the best of both worlds.

mer · May 1, 2024

ralphbsz said:
"There are two kinds of programs. One is so short that it obviously has no bugs. The other is so long that it has no obvious bugs."

Awesome.
And true.

astyle · May 1, 2024

ralphbsz said:
In the question of the boundary between safe code (for example manipulating complex data structures and variable length strings), and unsafe code (for example writing directly to memory at certain addresses, in a device driver), one argument is overlooked. I like to use an old saying from Dijkstra to explain it: "There are two kinds of programs. One is so short that it obviously has no bugs. The other is so long that it has no obvious bugs."

If one keeps the unsafe part (for example in a Rust program inside the unsafe block, or in-line assembly in C or C++) very short, then it can be written to be very clear, and it is correct "by inspection": the reader can reason about the code, they can walk through possible interaction, they can formally or informally check its correctness. While it may look unsafe, it isn't.

In contrast, large bodies of code are somewhere between hard to inspect and to prove correct. For those, we need automated tools. One of the easiest automated tools is a programming language that enforces memory safety. If used correctly, the combination of a small amount of unsafe code with a large amount of complex code can be the best of both worlds.

In that case, why not create a C header file that enforces memory safety, and maybe another one that regulates data races? If just about all languages are implemented using C anyway, I'd think that is something to try.

Benefits that I'm seeing would be consistent syntax and flexible starter templates that are easy to understand and use.
Any downsides - please tell me, I'll be watching this thread.

msplsh · May 1, 2024

Zare said:
What is that analogy supposed to mean anyway

Don't keep the memory unsafe tools around. People use them widely, then cause a bug with them.

kpedersen · May 1, 2024

Zare said:
What is that analogy supposed to mean anyway

It means don't go round his expecting to be greeted with a bowl of cookies.

Zare · May 1, 2024

astyle said:
In that case, why not create a C header file that enforces memory safety, and maybe another one that regulates data races? If just about all languages are implemented using C anyway, I'd think that is something to try.

Benefits that I'm seeing would be consistent syntax and flexible starter templates that are easy to understand and use.
Any downsides - please tell me, I'll be watching this thread.

Usually large C projects have internal facilities that minimize errors but programmers prudence is still expected.

msplsh said:
Don't keep the memory unsafe tools around. People use them widely, then cause a bug with them.

Why are you running FreeBSD then?

msplsh · May 1, 2024

Zare said:
Why are you running FreeBSD then?

I use a knife to cut paper because the scissor hasn't been invented yet?

Why not make better tools...

Zare · May 1, 2024

Because scissors don't cut paper laying down on a marble board. Even if you're allowed to raise it in your hands per requirement, you have to have additional tools if you wish to make a degree-precise cut with scissors.

I don't believe we're talking metaphores here.

Rust is a technology, technology is a compromise, if it's better in some aspects than C for a target goal, it is worse in others. That's a given, because magic does not exist.

The answer why not make better tools is there is no consensus on what's better. Right tool for the right job.

ralphbsz · May 1, 2024

astyle said:
In that case, why not create a C header file that enforces memory safety, and maybe another one that regulates data races? If just about all languages are implemented using C anyway, I'd think that is something to try.

Benefits that I'm seeing would be consistent syntax and flexible starter templates that are easy to understand and use.
Any downsides - please tell me, I'll be watching this thread.

Because what you're asking for is somewhere between undesirable and impossible.

Total memory safety? We can protect against using memory before allocating it and after freeing it. We can protect against the simple memory leak of allocating something and forgetting to free it when the pointer goes out of scope. Languages such as Java demonstrate how. We can not protect against all memory leaks, because one man's memory leak is another man's large and useful data structure (which sadly didn't happen to fit into memory today). That kind of leak continues to exist even in GC'ed languages: just allocate a whole bunch of things, add them to a list, and then forget that you have them and never use them again. Since the language can't predict the future, it doesn't know whether you have forgotten about them and will never use them, versus stashing them for something marvelous to be done later.

Race conditions? Easy to get rid of: Make all programs single threaded. The performance impact is awful, and getting worse (as our computing platforms become more and more parallel and NUMA). The easy solutions to prevent all data races (which involves de-facto adding a lock to every data structure, down to atomics) is in practice equivalent to running single threaded, except with a huge extra lock overhead. I've seen code that had this happen to it, when clueless programmers went overboard with "lock everything for safety". Even better, if you do "prevent memory races" carelessly, you end up with deadlocks, which are sort of worse than races or memory violations: harder to debug.

And the header file you're asking for sort of exists. It's in the C++ standard library, and is called "shared pointers and unique pointers". Try programming with them sometime, and then tell me whether you still like C++ as a language. Any header file that has multiple textbooks written about "move semantics" is not easy to use.

And even if a miracle happened, and both problems COULD be solved (which I think is as likely as P=NP being proven), you would still have the problem that at hardware and network interfaces, you can't be memory or race safe. If you use a network protocol, data spontaneously appears in memory, without your processor even knowing why and when. Then you need to access that miraculous data, and upon inspection, you will often (but not always) find a well-formed protocol packet describing an RPC. But that is not memory safe (the RPC appeared in memory when you weren't looking), and it is sort of the ultimate data race (good stuff appeared without any locking).

So no, it can't be done.

drhowarddrfine · May 1, 2024

ralphbsz said:
So no, it can't be done.

We're all doomed.

msplsh · May 1, 2024

Zare said:
no consensus on what's better

Yeah, well, it doesn't help that people talk about people that they don't agree with as zealots. Kinda makes it clear the discussion isn't in good faith.

astyle · May 1, 2024

ralphbsz said:
memory leaks, because one man's memory leak is another man's large and useful data structure (which sadly didn't happen to fit into memory today). That kind of leak continues to exist even in GC'ed languages: just allocate a whole bunch of things, add them to a list, and then forget that you have them and never use them again. Since the language can't predict the future, it doesn't know whether you have forgotten about them and will never use them, versus stashing them for something marvelous to be done later.

Aren't programs normally pretty deterministic? Sure, you can code up something that recursively allocates data structures, but even that is limited by the hardware on which it runs. Even bitcoin, with its exponential explosion of requiring complete transaction history stacks for every participant that joins - even that has limits.

A language cannot predict the future - but a program surely can? (Speculative Execution mitigation comes to mind).

Rust's secret sauce that supposedly takes care of data races - it seems to take care of them 'behind the scenes'.

Also:

ralphbsz said:
one man's memory leak is another man's large and useful data structure (which sadly didn't happen to fit into memory today).

This part did not make sense to me, sorry. RAM is roomier than before, so yesteryear's data structures should be capable of comfortably fitting into memory today???

ralphbsz · May 2, 2024

drhowarddrfine said:
We're all doomed.

Correct. If we allow clueless people without supervision (or in some cases actually malicious people) to write software that we rely on, we are indeed doomed. But to be honest, climate change and epidemics are in the same category.

astyle said:
Aren't programs normally pretty deterministic?

No. We can't even predict whether a program will eventually finish, or remain in an endless loop forever. That problem is provably unsolvable (look on the web for "halting problem"). OK, I'm exaggerating for comic effect: For most real-world programs, a skilled programmer can to a 99% accurate guess whether it will finish or not in a pretty quick inspection. But if we rely on "skilled programmers" and "99% accurate guess" and "inspection" for memory faults, we're doomed, thereby reducing it to a problem previously ignored.

This part did not make sense to me, sorry. RAM is roomier than before, so yesteryear's data structures should be capable of comfortably fitting into memory today???

Sorry, what I wrote was ambiguous. I didn't mean "today we have more RAM, therefore all programs that worked correctly in the past will continue working today". What I meant it this. Try to create an automated algorithm that distinguishes the following two scenarios:

My program processes data as it shows up (from disk, from network, from computation, whatever). It has a memory leak: every time data shows up, it allocates space for it, connects the data to a list, and then forgets to free it.
My program processes data as it shows up (same thing). It does some calculations, but it also saves the raw data, because it intends to do a "grand finale" of a detailed analysis that goes across all data. It allocates space for it, connects the data to a list, and keeps it there until the grand finale. Which never happens, because the computer runs out of space.

What's the difference between those two? Without reading and understanding the source code (including the comments), it's super hard to tell. And we don't have tools that can in general figure out what the program is going to do in the future. So the difference is all about "intent", a word that is usually used in criminal justice: there is a huge difference between intentionally running over an old lady on a crosswalk (that's called murder if it happens in an Agatha Christie novel), and a regrettable accident where a pedestrian is hit and killed due to circumstances beyond our control. Again, the difference is intent. In matters of life and death, we use "oracles" (called lawyers, judges and juries) to determine what the intent was. For software, we don't have particularly good oracle.

Totally different remark: I've been struggling with optimizing the performance of my Python-based backup system. Right now, the hourly run is taking over 5 minutes, and I have profiled it well enough to know that the problem is string handling in Python. The limit for an optimal programming language should be about 30 seconds. So I'm going to rewrite it (part of it is already in C). Since the situation of Java runtime systems is "icky", I'm going to learn Rust, just for fun. Wish me luck ...

astyle · May 2, 2024

ralphbsz said:
No. We can't even predict whether a program will eventually finish, or remain in an endless loop forever. That problem is provably unsolvable (look on the web for "halting problem").

I am familiar with the 'halting problem'. And thing is, it's something a compiler would be unable to spot, but a human can. And one would have to be quite intentional in coding up an algorithm that even qualifies to be a 'halting problem'. A simple example would be accidentally specifying a circular dependency in ports...

Not something that can be caught by a compiler.

As for distinguishing between the two scenarios in your post: That's exactly what Discord and Reddit do: slurp up data, puts it into a list/buffer for analysis, and then 'forget' (or never bother) to really free it up. Depending on the implementation (and choice of language) it's not that hard to see the extra routine to save the raw data. Even a compiler will see it.

ralphbsz said:
For software, we don't have particularly good oracle.

Well, for starters, there's only so much RAM available to a running process. A CPU can only churn through data so fast. One can usually take a base case and create a mathematical model of a curve along which a program executes. An extreme example would be bitcoin - if you want to mine, you have to download a complete copy of ALL bitcoin transactions ever completed, and process it. And these days, it's over 100 GB of raw data to process. If you limit your calculations to a single machine, and see that you can process maybe 1 gb/hour, that's 100 hours of nonstop analysis just to complete the task. My point is, that task will actually complete.

Speculative execution is basically like that - it tries to plan a few steps ahead by doing some simple math and pre-allocating a bit of RAM based on the likelihood of a given scenario.

Jose · May 3, 2024

ralphbsz said:
In contrast, large bodies of code are somewhere between hard and impossible to inspect and to prove correct. For those, we need automated tools. One of the easiest automated tools is a programming language that enforces memory safety. If used correctly, the combination of a small amount of unsafe code with a large amount of complex code can be the best of both worlds.

Exactly this, except that with Rust, you're forced to run the tools every single time you compile. If you use C/C++ you can run the tools in a separate step (Valgrind, etc.)

cracauer@ · May 3, 2024

Jose said:
Exactly this, except that with Rust, you're forced to run the tools every single time you compile. If you use C/C++ you can run the tools in a separate step (Valgrind, etc.)

But that are very different things. Valgrind will only detect defects which you actually trigger during your test run. Rust aims to detect theoretical errors.

waleedmebane · Monday at 7:41 AM

Zare said:
The answer why not make better tools is there is no consensus on what's better. Right tool for the right job.

I thought this thread was about adding Rust and not about eliminating C, if there are some compelling benefits to Rust for some use cases in the base system and the cost is not too great.

msplsh · Monday at 2:29 PM

If Rust gets added and it works out well, then the C people will get pressure to write new code in Rust and who wants that. Best crush it before it starts.

Crivens · Monday at 2:44 PM

msplsh said:
If Rust gets added and it works out well, then the C people will get pressure to write new code in Rust and who wants that. Best crush it before it starts.

If glass panes get added to windows, then the curtain people will need to make different curtains and who wants that. Best smash them before it starts.

Yes, makes no sense, does it?

The Case for Rust (in the base system)

rmomota

msplsh

Crivens

Administrator

Zare

msplsh

Zare

ralphbsz

mer

astyle

msplsh

kpedersen

Zare

msplsh

Zare

ralphbsz

drhowarddrfine

msplsh

astyle

ralphbsz

astyle

Jose

cracauer@

waleedmebane

msplsh

Crivens

Administrator