Solved Port FreeBSD sysctl library to Mac

Samuel Venable · Oct 27, 2020

This very well could be considered "Too Far Off Topic" but not entirely sure because this question is only answerable by someone who is well familiar with FreeBSD, not just macOS and Darwin. If it is too far off topic feel free to close this, and I'll try not to cry lol.

I want to port system libraries and OS-level features of FreeBSD to macOS and Darwin. More specifically process handling code. Stuff that uses sysctl() and the implementation of that function is much more limited on mac. So I wanted to add features mac is missing that FreeBSD has, basically.

The only thing I can think of in particular, that I actually need for my projects, is getting the environment block from a given process id. This can be done on both FreeBSD and macOS but on mac it's more limited. On FreeBSD you can get the environment variables from a given pid in their current state at any given point in time, where as with mac KERN_PROCARGS2 and KERN_PROCARGS only allow getting the environment block of a process id when the process initially started from the command line that launched it, so if a environment variable was added or an existing one changed since process creation, you would have no way to get that information on mac.

Does anyone know if stuff like this is possible to port from FreeBSD to mac? All I want to know is if it is possible without reverse engineering macOS. Since Darwin and the macOS kernel is open source it would seem it is possible? Or am I missing something?

msplsh · Oct 27, 2020

Honestly sounds like a security vulnerability. Why would you flip environment variables while running? Internet seems to think this is a no-no. I wouldn't be surprised if the Mac outright prohibited this.

ralphbsz · Oct 27, 2020

What's the use case for inspection the environment variables from a running process, other than implementing tools like ps and top? And what's the use case for changing environment variable while the process is running? At user level, most software runs under the assumption that environment variable don't change (unless the program changes them itself, using setenv and putenv), so it is safe to inspect environment variables at any time.

msplsh · Oct 27, 2020

It was why you'd use putenv() that I was wondering. Anyway, I think you want to fetch the thread_userstack (somehow) for the process to get the information you need out of macos.

Samuel Venable · Oct 29, 2020

I wanted to do this mainly for a simple means of interprocess comunication that didn't involve networking or reading/writing external files, and while reading process output can also be done, it's not very practical for my particular use case. While the library I am writing technically can probably read stdout as a means to do similar, i also would like the communication between processes that print a bunch of unrelated stuff to the terminal i have no control over, in this case several game engine I've ported the library to work with. It is do-able without environment variables, but not very efficiently for what I need.

There could be a much better means to do this entirely for what I aim for, but I have yet to find that. Reading/writing files is pretty slow. That and I don't know much about networking which is more of an excuse than anything.

I'll check out thread_userstack, it is a darwin api so that's a huge plus, being an officially exposed api is much better than a private one as i'd prefer to allow for appstore releasing using my library.

Thank you msplsh!

ralphbsz · Oct 29, 2020

OK, so you have two (or more) processes on the same machine (not across a network), and you want them to communicate, easily and slowly (you're not doing HPC-style supercomputing), right? And you're thinking that using fundamentally putenv() and getenv() on each other?

That's actually a pretty neat idea. The nice thing about using environment variables is that you automatically get key=value syntax. And for short things (dozens of bytes) it will be easy to debug and use. Only problem is that putenv() into the address space of another process is inherently nasty.

So here are some alternate suggestions. What programming language are you using? Some languages have IPC tool kits, although I haven't used them. One thing that works really well and is super easy for small problems: Use shared memory. The ancient SysV shared memory API is very simple to use. If you're not sharing a lot of data, just partition your memory: for example allocate a 4K page, use the first 256 bytes from process foo to bar, the second 256 bytes from bar back to foo, and a simple protocol with a few bytes that are updated to know when new data arrives.

Now, if you like shared memory, you can achieve exactly the same thing by create a file and mmap()'ing from the processes. Matter-of-fact, I bet if you do that, it will run a memory speed, because there is really no need for the file system to write data to disk. And as long as you're not calling sync(), I bet that simply using read() and write() to a file (which is open from both processes) will be nearly as fast; you are just using a few CPU cycles to copy data to/from kernel space, which for small amount of data is not relevant.

Here is an alternative proposal: Just use networking. It's not that hard. Sure, if you start from scratch with socket() and accept() calls, and write your own protocol, it's tedious, and beginners usually get it wrong. But you can use a pre-cooked RPC mechanism, and describe your communication as message. I've recently used GRPC between Python programs (both within a single machine and over the network), and after 2 or 3 hours of reading documentation, writing the code was very fast and efficient.

Samuel Venable · Oct 29, 2020

Seeing as thread_userstack has no documentation that I can find like a lot of things in the Apple world, networking is probably the best choice and it actually is documented and a little reading won't kill me. I think you are on to something there. I wasn't actually writing environment variables from other processes, just reading them. But that doesn't make a huge difference now, I don't think I'll be use environment variables at all now for this purpose. To answer your question I'm using C++ and Objective C only when I don't have much else of a choice.

msplsh · Oct 29, 2020

thread_userstack is internal to the kernel API. I found it in the MacOS X Internals book. You'd have to write a kext, since what you want to do is highly esoteric.

If what you're doing is IPC, then use unix sockets. They aren't networking, are very fast, and are portable between the two platforms.

ralphbsz · Oct 29, 2020

Samuel Venable said:
Seeing as thread_userstack has no documentation that I can find like a lot of things in the Apple world, networking is probably the best choice and it actually is documented and a little reading won't kill me. I think you are on to something there. I wasn't actually writing environment variables from other processes, just reading them. But that doesn't make a huge difference now, I don't think I'll be use environment variables at all now for this purpose. To answer your question I'm using C++ and Objective C only when I don't have much else of a choice.

About 28 years ago, I did exactly what you're doing, using shared memory (first using SysV shared memory on a single machine, then using hardware shared memory on a VME backplane with multiple CPU cards, and finally using SCI, a coherent memory bus) in C++. The problem with using shared memory (and environment variable setting is a form of shared memory) is the protocol and notification: How does the receiver know that the sender has deposited a message in the memory buffer? How does the sender know that the receiver is ready to handle a message, or that the receiver has processed the previous message? How do you coordinate where in memory to write new messages? All these things can be solved, but the solutions tend to be hacky, doing it "correctly" is surprisingly difficult, and the hacky solutions tend to work for a little bit and then fail (usually in a nasty fashion, for example livelock, not with easily debuggable symptoms). This is why these days I prefer using explicit network protocol, most easily done as RPC: Sender sends a message to the receiver over a socket-like streaming protocol. The message is encoded or packed using a data description language, to make sure the content is inambiguous and platform independent. The messages are explicit orders, and the replies are explicit acknowledgements. Here are a few of the RPCs and replies that I use in my homebrew system:
Turn pump 2 on. Yes, I have turned pump 2 on, and the previous state of it was off.
Turn pump 2 off. No, I can not turn pump 2 off, because of the following error: serial port hardware flow control is down.
Please record the pressure measurement at water softener is 59 psi at 11:08 Thursday. Yes, I have recorded it, the previous measurement was 61 psi 90 seconds earlier.
Please record the pressure measurement at sater woftener is 59 psi at 11:09 Thursday. No, I have never heard of a sensor with that name.
Alarm at 11:18 on Thursday the water level in the tank is too low, only 8900 gallons. Yes, I have received the alarm from 11:18 and filed it and will take appropriate action.
Please tell me the state of all pumps. Yes, pump 1 is off, was last on at 8:30 on Thursday, and pump 2 is on, has been on since 11:08 Thursday, has a hardware error with serial port hardware flow control.
Please tell me the meaning of life. No I don't know that variable.

I think the advantage of using this "conversational" style of interprocess communication is that it is much easier for the human who reads the code or debugs a problem to reason about what's going on. And obviously the actual implementation (in proto files that get fed into the GRPC mechanism) is considerably more technical than the English sentences I wrote above.

Samuel Venable · Oct 29, 2020

Thanks guys for the extensive information. I find this really interesting.

I'm not sure how accurate this is, but according to this stack overflow answer, Windows Sockets are based on code from the original BSD: https://stackoverflow.com/a/28031039

So even Windows, being based on DOS and not a Unix-like has some Unix-like code deep within. Really weird as I didn't think that would even work as far as compatibility between the operating systems is concerned. It's all connected to some small degree, apparently.

I've posted it before, but I'll post it again, here is the library I am talking about in this thread:

time-killer-games/libprocinfo

cross-platform library and c++ api for process-related functionality - time-killer-games/libprocinfo

github.com

It will need serious revisions and have dependency on xlib removed, but I did contact the FreeBSD developer mailing list about the possibility of getting this into a future release of FreeBSD, and I did point out all the flaws that I would need to work out before doing that. They never responded, I guess it got buried under things they prioritized as more important.

One of the things being the use of only double and char * for the exported functions, that is a limitation of the game engine I ported it to but it doesnt have to stay that way.