The Case for Rust (in the base system)

cracauer@ · Aug 22, 2024

Somebody sat down and looked at things:

"Memory Safety in C++ vs Rust vs Zig"

Memory Safety in C++ vs Rust vs Zig

A look at C++, Zig and Rust in terms of memory safety

medium.com

Andrey Lanin · Aug 22, 2024

cracauer@ said:
Somebody sat down and looked at things:

"Memory Safety in C++ vs Rust vs Zig"

Memory Safety in C++ vs Rust vs Zig

A look at C++, Zig and Rust in terms of memory safety

medium.com

Another one C++? No, thanks

Good intention - https://www.circle-lang.org/site/intro/
But wouldn't it be better to offer this as a proposal for the next ISO/IEC 14882 standard, rather than create a non-standard compliant C++ compiler?

kpedersen · Aug 22, 2024

Andrey Lanin said:
Good intention - https://www.circle-lang.org/site/intro/
But wouldn't it be better to offer this as a proposal for the next ISO/IEC 14882 standard, rather than create a non-standard compliant C++ compiler?

I agree. And in some ways it shouldn't even need to be an addition to the language standard but simply a replacement to the "standard" library (which is crap, almost like Bjarne's cfront's SC was also crap).

Things such as:

Code:

inline T& vector::operator[](const VecLock& _index) { _index.lock(m_criticalpin); ... } /* index is an RAII lock*/
inline VecLock::VecLock(const size_t& _index) { ... } /* Lock has a conversion contructor so it is treated like a size_t */

Goes a long way to creating 100% memory safety. Simply lock memory upon access (and during lifespan of access). Sure, it might not be as flexible (pointers *can't* dangle) but that is still less of a refactor than a rewrite in a different language where pointers also can't dangle). Best thing is that it can be stripped out at compile time for the fastest unsafe builds.

The problem is that for the last decade, the C++ community has been like a little fat child wanting more and more features rather than actually considering safety. The last time was Technical Report 1 (TR1) pre C++0x and pre-c++11. That is a long time since actually actioning safety.

At work we use such a replacement (i.e <sys/vector>, <sys/list>) and it has been remarkably successful. So much so that we are in the process of writing a paper and try to get it out there. The problem is that being *the* industry standard, C++ is filled with noise from all random twits.

Andrey Lanin · Aug 22, 2024

I think that the problem is that it cannot be “replaced”, because “compatibility” with C must remain (or is it time to forget about it?).

I'm afraid we're getting further and further away from the topic (the topic is about Rust after all)

kpedersen · Aug 22, 2024

Andrey Lanin said:
I think that the problem is that it cannot be “replaced”, because “compatibility” with C must remain (or is it time to forget about it?).

Its interesting because no matter how much we break compatibility with C in C++, it will *still* have better compatibility with C than a language like Go/Rust/Swift which is not close to a superset. They require binding generators (SWIG/bindgen) which can only do ~80% coverage of C.

Besides, I don't think compatibility with C *or* C++ needs to be broken. std:: just needs to be frozen and a new (safe in debug+runtime) standard library developed.

astyle · Aug 23, 2024

Sometimes you have to wonder what's a 'better approach' ... they all have advantages and drawbacks.

For example, in C/C++ there is a proliferation of non-standard libs and templates, so the result is a toolkit (like x11-toolkits/qt5). The drawback is that you can get lost trying to keep track of scope where a particular call even applies, so debugging can be a major pain.

Or you can try to do memory safety as a standard feature in the language, like Java - but then the compiled code is not that efficient.

And even if you try to resolve Java's shortcomings with Rust, the crates rear their ugly heads in every single Rust-based project. ?

unitrunker · Aug 23, 2024

Andrey Lanin said:
But wouldn't it be better to offer this as a proposal for the next ISO/IEC 14882 standard, rather than create a non-standard compliant C++ compiler?

Circle C++ does look to be a good way to provide a proposal in the form of a complete vision as opposed to lots of little suggestions. I mean, you can have both, right?

unitrunker · Aug 23, 2024

Back on topic - does Rust have a bootstrap compiler? Something written in C/C++ that could be used to compile a full fledged Rust toolchain? This smaller compiler could fit in base and possibly be complete enough to compile Rust coded kernel drivers (which - in my opinion - should be crate-less).

Edit: seems the answer is no.

T-Aoki · Aug 23, 2024

unitrunker said:
Back on topic - does Rust have a bootstrap compiler? Something written in C/C++ that could be used to compile a full fledged Rust toolchain? This smaller compiler could fit in base and possibly be complete enough to compile Rust coded kernel drivers (which - in my opinion - should be crate-less).

Edit: seems the answer is no.

Of course, at the first place. But maybe no longer maintained, at least the one which does NOT include codes written in Rust at all.

If I recall correctly, even C had bootstrap compiler at the first place (in AT&T Bell rabo) written in asm (or even directly typed in [or punched] in binary codes). Without it, how can C compiler codes written in C be built?
Just the same SHALL be applicable to Rust. Not sure it was written in C or not, as any programming language which already has compilers or interpreters and possibe to write compilers can do the job sanely.

And once the first compiler which can cross compile to other CPU/OS is finished, the supported CPU/OS can have full compiler.

Frankly, the previous version of any compilable language can build at least the bootstrap compiler for the next version. Without it, how the compiler developer build the new version? And Rust has a port for building bootstrap for next version, lang/rust-bootstrap.

T-Aoki · Aug 23, 2024

kpedersen said:
Its interesting because no matter how much we break compatibility with C in C++, it will *still* have better compatibility with C than a language like Go/Rust/Swift which is not close to a superset. They require binding generators (SWIG/bindgen) which can only do ~80% coverage of C.

Besides, I don't think compatibility with C *or* C++ needs to be broken. std:: just needs to be frozen and a new (safe in debug+runtime) standard library developed.

In my humble opinion, must-kept compatibility is the ability to sanely link with codes written in C.

The best would be let LLVM to have Rust frontend, using backends already existing in FreeBSD base. Just pull in Rust "language" only without external (not being managed by FreeBSD project) crates (ecosystem), generates same object format just as C frontend (clang, cc).

NapoleonWils0n · Aug 23, 2024

talking of rust

in the 1980s my Dad had a Lancia Delta car when he lived by the sea in Brighton
and all the salt in the sea air caused the car to rust and you could poke your finger through the body work

msplsh · Aug 23, 2024

Bootstrap compilers are typically discarded.

cracauer@ · Aug 23, 2024

Same with CMU Common Lisp, which you could only compile with CMU Common Lisp. Once upon a time it was bootstrapped from Spice Lisp, but that was long gone during most of CMUCL's time.

CMUCL's derivative brought back bootstrapping from select other CL implementations.

bakul · Aug 23, 2024

Is CMUCL pretty much dead now?

cracauer@ · Aug 23, 2024

bakul said:
Is CMUCL pretty much dead now?

I think so. SBCL (it's fork) is much better in every respect except maybe some floating point operations.

unitrunker · Aug 23, 2024

This popped up today:

Linus Torvalds talks AI, Rust adoption, and why the Linux kernel is 'the only thing that matters'

In a wide-ranging conversation with Verizon open-source officer Dirk Hohndel, 'plodding engineer' Linus Torvalds discussed where Linux is today and where it may go tomorrow.

www.zdnet.com

Torvalds commented, "Another reason has been the Rust infrastructure itself has not been super stable."

Alain De Vos · Aug 24, 2024

Doesn't rust has a feature creep just like C++.

ralphbsz · Aug 24, 2024

A lot of the posts here talk about the need to have a very simple and easy way to use existing C code in whatever new language we want. Supposedly because for the foreseeable future we'll want to call the existing code.

I will make one counter-argument, and tell one anecdote to strengthen it.

Counter-argument: There are two kinds of C interfaces you would want to call. The first kind is things like the POSIX standard library, with functionality like open file, connect socket, and such. Those things a new language will want to wrap into interfaces that are idiomatic in the new language anyway, so normal code won't call them. And even if it did, those things are very very safe and hard to abuse (they're designed that way).

The second kind of C code you will want to call is stuff that is written by "yourself" (in big systems that usually means in-house teams). This is exactly the kind of code you want to get away from, because it is unsafe, and too often not well done. You don't want to call this code, you want to replace it with well-written and safe code.

In reality, there is a third kind of code: well-tested libraries that you acquire from outside, for example open source, or commercially (yes, commercial library code still exists, but is not often seen by amateurs). I think this category is rare, and many things in it (in particular open source code that comes from badly run development projects, see the recent xz debacle) is in the second category.

Now the anecdote. Three decades ago, I worked in a company whose main code base was 1/2 million lines of very messy C code, which had been partially upgraded with C++ features (we have a few classes), and was very badly written and unstable. The company was not a software company, many "software engineers" were part-timers from other departments (be it image processing, electrical engineering, or manufacturing technicians), and it didn't have a culture of good software processes. Since I was already a C++ expert, I was explicitly hired for the great project that was going to save the code base: A new framework, explicitly designed in C++ for safety and good coding practices, intended to be friendly for less skilled software engineers (since the reality of having to use part-time engineers didn't go away), and able to call some of the existing code after vetting it. That was a TOTAL disaster. Why? Several reasons. First, it is very difficult to write leak-free and memory-safe C++, and with the state of the language in 1997, it was virtually impossible. Second, C++ is such a hard language to master (and you need mastery to write safe code that is correct even under error conditions, while any beginner can hack up something that functions on the good path) that working a large project with not all great programmers didn't work. And most importantly: Every time you called any of the old code, things blew of left and right ... just like the old code was blowing up all the time in its normal production usage.

At this point, an important decision was made: We gave up on C++. We switched to Java, the best choice available at the time, but explicitly embedded a Python interpreter in the Java code. The single most important rule was: We will never ever call old code. Instead, we rewrite it in Java, while simultaneously documenting, sanitizing, and organizing it. If a part-time programmer who is not a skilled software engineer needs to use the system, they can write Python scripts. About 1% of the code base was determined to be performance critical, and that was re-coded from scratch in C and called via JNI. The way we did this is that a scout team of 5 people (I was one of them) implemented a tiny part of the 500K line system in Java from scratch, allowing a lot of scaffolding (no DB interfaces, no UI/UX) in one year, as a proof of concept, and to get the basic coding standards, software processes and generic libraries done. Then a rapidly increasing team of up to 150 engineers was added to the team, and within 3 years, they had re-implemented the whole thing into a functional state.

This started about 1997 or so. I last talked to former colleagues a few years ago (sadly, at the funeral of a colleague): the system is now up to 17M lines, covers a much larger product line, can do networked (cloud-like or scale-out) operation, is used by multiple corporations, and has been touched by thousands of engineers. The basic design decisions are holding up fine. Why? Because early on we decided to intentionally give up compatibility and calling old, badly written code, and making "clean design and good craftsmanship" the guiding principle.

From this viewpoint: I prefer a language that can NOT call existing C code.

NapoleonWils0n said:
talking of rust

in the 1980s my Dad had a Lancia Delta car when he lived by the sea in Brighton
and all the salt in the sea air caused the car to rust and you could poke your finger through the body work

Ah, Lancia, a close relative of Fiat and Alfa Romeo. I drove an Alfa Spider when I was a graduate student. It is the only car that begins rusting when its picture is printed in the sales flyer.

freethread · Aug 24, 2024

In my very own opinion there are deep differencese between a generic software and an operating system. They are "programs" written in some "language" but the first use the second and not viceversa. The first can be more easily reengineered, not the second, unless you want to make a new product separated by the original one.

NapoleonWils0n said:
talking of rust

in the 1980s my Dad had a Lancia Delta car when he lived by the sea in Brighton
and all the salt in the sea air caused the car to rust and you could poke your finger through the body work

ralphbsz said:
Ah, Lancia, a close relative of Fiat and Alfa Romeo. I drove an Alfa Spider when I was a graduate student. It is the only car that begins rusting when its picture is printed in the sales flyer.

Ahahahaha I never owned a FIAT (or Lancia, Alfa, Autobianchi...Ferrari)

T-Aoki · Aug 24, 2024

ralphbsz said:
A lot of the posts here talk about the need to have a very simple and easy way to use existing C code in whatever new language we want. Supposedly because for the foreseeable future we'll want to call the existing code.

I will make one counter-argument, and tell one anecdote to strengthen it.

Counter-argument: There are two kinds of C interfaces you would want to call. The first kind is things like the POSIX standard library, with functionality like open file, connect socket, and such. Those things a new language will want to wrap into interfaces that are idiomatic in the new language anyway, so normal code won't call them. And even if it did, those things are very very safe and hard to abuse (they're designed that way).

The second kind of C code you will want to call is stuff that is written by "yourself" (in big systems that usually means in-house teams). This is exactly the kind of code you want to get away from, because it is unsafe, and too often not well done. You don't want to call this code, you want to replace it with well-written and safe code.

In reality, there is a third kind of code: well-tested libraries that you acquire from outside, for example open source, or commercially (yes, commercial library code still exists, but is not often seen by amateurs). I think this category is rare, and many things in it (in particular open source code that comes from badly run development projects, see the recent xz debacle) is in the second category.

Now the anecdote. Three decades ago, I worked in a company whose main code base was 1/2 million lines of very messy C code, which had been partially upgraded with C++ features (we have a few classes), and was very badly written and unstable. The company was not a software company, many "software engineers" were part-timers from other departments (be it image processing, electrical engineering, or manufacturing technicians), and it didn't have a culture of good software processes. Since I was already a C++ expert, I was explicitly hired for the great project that was going to save the code base: A new framework, explicitly designed in C++ for safety and good coding practices, intended to be friendly for less skilled software engineers (since the reality of having to use part-time engineers didn't go away), and able to call some of the existing code after vetting it. That was a TOTAL disaster. Why? Several reasons. First, it is very difficult to write leak-free and memory-safe C++, and with the state of the language in 1997, it was virtually impossible. Second, C++ is such a hard language to master (and you need mastery to write safe code that is correct even under error conditions, while any beginner can hack up something that functions on the good path) that working a large project with not all great programmers didn't work. And most importantly: Every time you called any of the old code, things blew of left and right ... just like the old code was blowing up all the time in its normal production usage.

At this point, an important decision was made: We gave up on C++. We switched to Java, the best choice available at the time, but explicitly embedded a Python interpreter in the Java code. The single most important rule was: We will never ever call old code. Instead, we rewrite it in Java, while simultaneously documenting, sanitizing, and organizing it. If a part-time programmer who is not a skilled software engineer needs to use the system, they can write Python scripts. About 1% of the code base was determined to be performance critical, and that was re-coded from scratch in C and called via JNI. The way we did this is that a scout team of 5 people (I was one of them) implemented a tiny part of the 500K line system in Java from scratch, allowing a lot of scaffolding (no DB interfaces, no UI/UX) in one year, as a proof of concept, and to get the basic coding standards, software processes and generic libraries done. Then a rapidly increasing team of up to 150 engineers was added to the team, and within 3 years, they had re-implemented the whole thing into a functional state.

This started about 1997 or so. I last talked to former colleagues a few years ago (sadly, at the funeral of a colleague): the system is now up to 17M lines, covers a much larger product line, can do networked (cloud-like or scale-out) operation, is used by multiple corporations, and has been touched by thousands of engineers. The basic design decisions are holding up fine. Why? Because early on we decided to intentionally give up compatibility and calling old, badly written code, and making "clean design and good craftsmanship" the guiding principle.

From this viewpoint: I prefer a language that can NOT call existing C code.

Ah, Lancia, a close relative of Fiat and Alfa Romeo. I drove an Alfa Spider when I was a graduate student. It is the only car that begins rusting when its picture is printed in the sales flyer.

You missed some points.
First of all, FreeBSD would be categorized into your case 3.

Secondly, if compiled codes by new compiler language like Rust to be imported cannot link with C codes, it measn ALL CODES INCLUDING WHOLE ECOSYSTEM (not disclosed but using FreeBSD as its base or platform) SHALL BE REWRITTEN ALL AT ONCE. Otherwise, it means unworkable systems in the wild. It can be real disaster, if any of important infrastructure in the world are affected. It SHALL not happen. This is because I'm repeatedly stating Rust in FreeBSD base must be forced to use cdylib at minimum. FreeBSD base is an operating system. Platform. Not a middlewares nor applications.

Of course, codes for independent utilities, which does NOT provide any libraries and do NOT call anything except fundamental system libraries like libc just for system calls would be no problem to reimplement.

Crivens · Aug 24, 2024

I think we all are missing the point. What we need is some new hardware which enforces the memory access according to rights for any object itself. Otherwise, there will always be ways to trample on some memory you are not allowed to trample on. We can try to make languages memory safe, make one context memory safe - the system itself will not be memory safe. And we will not change that. We should invest the energy into that. Change my mind.

T-Aoki · Aug 24, 2024

Crivens said:
I think we all are missing the point. What we need is some new hardware which enforces the memory access according to rights for any object itself. Otherwise, there will always be ways to trample on some memory you are not allowed to trample on. We can try to make languages memory safe, make one context memory safe - the system itself will not be memory safe. And we will not change that. We should invest the energy into that. Change my mind.

Something like Cheri hardwares?

msplsh · Aug 24, 2024

Crivens said:
enforces the memory access

uhh

https://en.wikipedia.org/wiki/Memory_protection ?

Crivens · Aug 24, 2024

msplsh said:
uhh

https://en.wikipedia.org/wiki/Memory_protection ?

On a much finer grid. Means, you have a key which is part of the pointer and memory assigned to that key. You don't have the original pointer (and thus not the key) you get a trap upon accessing any memory with it. A buffer would thus be bound to the pointer from it's creation and doing things like stomping on return addresses or messing with heap management would fail. Still no fix for rowhammer, but a lot better than trying to guarantee this from a compiler. Because inline assembler will come up and the tricks you can do with the relocation information in ELF (that s*** is touring complete. What the f...?)

ralphbsz · Aug 24, 2024

Crivens said:
I think we all are missing the point. What we need is some new hardware which enforces the memory access according to rights for any object itself. Otherwise, there will always be ways to trample on some memory you are not allowed to trample on. We can try to make languages memory safe, make one context memory safe - the system itself will not be memory safe. And we will not change that. We should invest the energy into that. Change my mind.

It is a nice dream. The ancient Borroughs 5xxx machines actually had the first such thing, where each memory cell knew whether it contains an integer or a floating point number, and trying to use it the wrong way caused the computer to stop. This was in the 1960s, where the idea of complex data structures in memory, multiple threads, and all the modern complexities didn't exist yet.

One of the reasons it remains a dream is that sometimes you need to violate memory access protections. For example, in a von Neumann architecture, you need to load a program from disk into memory. At this point, it is an unstructured array of bytes, since that is what is stored on disks (meaning in file or block storage systems). And then you need to tell someone: this set of bytes is allowed to be executed. That action right there is like the moment Eve bit into the apple: the original sin. Sure, you could teach disks to also only store structured data, where each byte is tagged with "this can be read by process X, written by process Y, and executed by process Z". But who writes this structured data to disk? Say for example a compiler. Thereby reducing the problem to Ken Thompson's famous C compiler that recognizes when the password checking code is being compiled, and puts a backdoor for Ken in. Who watches the watchers?

So an absolute and perfect solution for memory protection in hardware is probably not even feasible. Would a partial solution (with some loopholes built in) work? Would it be an efficient use of gates, cycles, and electrical power? Of brain time?

The Case for Rust (in the base system)

cracauer@

Memory Safety in C++ vs Rust vs Zig

Andrey Lanin

Memory Safety in C++ vs Rust vs Zig

kpedersen

Andrey Lanin

kpedersen

astyle

unitrunker

unitrunker

T-Aoki

T-Aoki

NapoleonWils0n

msplsh

cracauer@

bakul

cracauer@

unitrunker

Linus Torvalds talks AI, Rust adoption, and why the Linux kernel is 'the only thing that matters'

Alain De Vos

ralphbsz

freethread

T-Aoki

Crivens

Administrator

T-Aoki

msplsh

Crivens

Administrator

ralphbsz