C Can I use the most significant bit of a pointer as a flag?

Hello everyone,

I am wondering if I can use the most significant bit of a pointer as a flag. That means storing a bit of information in the most significant bit, and clear this bit with a mask before using the memory address.

I did not find any official documentation about the layout of the virtual address space of a FreeBSD process, but this document (page 4) says:
The kernel’s virtual addresses are permanently mapped into the high part of every process address space.

I think the high half of the virtual address space is inaccessible to the process (in other words, malloc() can't return an address in that area). Therefore the most significant bit of any memory address accessible to the process should always be clear (equal to zero). I would like to confirm that by finding an official FreeBSD documentation showing the layout of the virtual address space of a process.

The only FreeBSD ports my program has to be compatible with are: powerpc64le, amd64 and aarch64. I understand that for 64-bit ARM and 64-bit x86, the architecture dictates that the high half of the address space is reserved for the kernel, but I did not find such guarantee in the Power ISA 3.0 specification.

Thank you for your help.
 
I wouldn't do it, regardless of the architecture. Pointer is an address, addresses are architecture dependent, so mucking about with them is usually a bad idea. Most that folks do is mask/modify lower bits if there is a specific alignment need outside of processor alignment.
 
If the pointers all refer to objects of a uniform size with all objects stored contiguously and the base address is known - you could store an index instead of a pointer. That would leave room for a few flags.

Worst case is you use a descriptor which holds the pointer and any flags. Nothing wrong with that.
 
Worst case is you use a descriptor which holds the pointer and any flags. Nothing wrong with that.

It would double the memory consumption of my main integer type (128 bits instead of 64 bits). If the most significant bit of memory addresses is always equal to zero, I need to know about it.

Storing an index instead of a memory address does not work in my case.
 
I think it will work. My (now somewhat vague) memory from running Linux on PowerPCs was that the upper half is reserved for the kernel; I presume that's the case on FreeBSD too.

I think it is also a really bad idea.

If you are crazy enough to do it, I would run with a modified version of the run-time library, which puts an assert into sbrk and malloc, to make sure they never return an address with the high bit set. That's sort of a canary in the mine: If that assert ever fires, you know that the underlying OS has changed behavior, and your technique has been defeated. It's much easier to figure that out with a clean assert, rather than with bizarre bugs that are introduced when your 1-bit flag is set unexpectedly.

Why is it a bad idea? Two reasons. First, one of these days the OS will change the memory layout in a fashion that's compatible for people who use pointers transparently, and not for you. Every single time a fixed-width set of bits was used, it has been exceeded. For example, the early IBM mainframes (the 360/370 class machines) used 24-bit addresses, because in 1960 they could not imagine having more than 16 MiB of memory. That broke around 1980. Similarly, the (misattributed) Bill Gates quote about 640K being enough for anyone was also correct when the IBM PC first came out, and became wrong later. In the 1990s, we worked really hard to move all file system interfaces to use 64-bit file sizes and byte offsets; today there are systems in production where those 64 bits are no longer sufficient (but that's files, not RAM).

A particularly good example is the old OS360 architecture. It used 32-bit words for pointers, but the uppermost 8 bits were de-facto unused. So what happened is that programmers started storing stuff in there (that was commonly known as the "control byte"). And then, when the upgrade to the 31-bit addresses happened, all hell broke lose, and a lot of effort had to be put into cleaning up old source code. Lesson learned: Leave to the architecture what is the architecture's, and treat pointers as opaque.

Second reason this is a bad idea: You code will be complex. Everytime you want to use a pointer, you first have to go through and clear your flag bit and make a temporary copy, then use the pointer, then put the flag bit back. Any interruption in this is dangerous. If the modified pointers are stored in memory shared with multiple threads, you need to think through concurrency issues (the add/remove of the flag bit has to be atomic), which will probably require locking, or highly brittle code that uses atomics. This is the kind of high-wire act that super programming teams can get away with (with lots of work and being super careful), but that mere mortals should stay away from, too error-prone.
 
Setting aside the issue of storing a flag inside a pointer, here are two situations where it is useful to perform a small amount of math on each pointer.

1. A technique described in one of Knuth's "Art of Computer Programming" volumes saved memory on a doubly linked list structure by storing the XOR of the forward and back pointers. This made sense on a 16 bit address bus (something like an Apple ][ or PDP-11). Might be useful today in an Arduino.

2. A lifetime ago I wrote a C library to manage an AVL tree. Nothing special there except multiple processes needed access to this tree. The library used shared memory so that one process could update the tree while other processes had read-only access. Each process mapped the shared memory into its own process space at some arbitrary base address. Storing process absolute pointers wasn't going to work. The gimmick I used was self-relative pointers instead of process absolute pointers. Each process computed the correct address for its own address space by a simple addition.

Honestly - those are the only two examples I can think of for using computed on-the-fly pointers.
 
  • Like
Reactions: mer
One comment I found on Stack that talked about masking a pointer; and it mentioned it is a very DANGEROUS move.

If you know your exact memory layout you can probably do it, but it's risky. The most common 64 bits systems for Windows/Mac/Linux are amd64. On them the machine only has 48 bit virtual addresses (for the foreseeable future), so you have 16 bits to play around in plus the lower aligned bits, theoretically.
Except. Half of the address space is negative; addresses go between [-2^47,2^47). So you can't be sure if the bits set in the pointer actually mean that your magic bits are set or you just have a negative address.
Except. Today, most, if not all operating systems put the kernel in the negative address space and put the userland in the positive address space. It makes certain things easier and faster to manage. So you could abuse that knowledge to assume that playing with those bits should be safe.
Except. I've never seen a guarantee from any operating system that this situation will remain forever (doesn't mean that one doesn't exist, I just haven't seen one). You might update your kernel one day and suddenly the operating system decided that userland is negative and kernel is positive‚ or userland gets more address space to play around in.
As long as you mask out the extra bits before you dereference your pointers, you will be safe today, but maybe not tomorrow. And when you build your code around an assumption like this, you deserve all the pain you get when your undefined behavior you get away with becomes undefined behavior you don't get away with. Painting yourself into a corner like this is not fun.
https://stackoverflow.com/questions/33955713/storing-data-in-the-most-significant-bits-of-a-pointer

One other part to also keep in mind, is ASLR could very well affect the memory addresses. If you are that worried about a few bytes, you are much safer using a char array or boolean (chars have a size of 1) vs a pointer typically about 8 (pointer size can vary between compilers and platforms).

It would double the memory consumption of my main integer type (128 bits instead of 64 bits)
I am curious where you get a size of 128 bits for a int. Doing a test on my machine; I got this results using C++20 standard:

C++:
#include <iostream>

int main ( int argc, char * argv[] )
{
  std::cout<< "size of char: " << sizeof (char) << std::endl;
  std::cout<< "size of short: " << sizeof (short) << std::endl;
  std::cout<< "size of int: " << sizeof (int) << std::endl;
  std::cout<< "size of long: " << sizeof (long) << std::endl;
  std::cout<< "size of long long: " << sizeof (long long) << std::endl;

  std::cout<< "size of float: " << sizeof (float) << std::endl;
  std::cout<< "size of double: " << sizeof (double) << std::endl;

  std::cout<< "size of pointer: " << sizeof (int *) << std::endl;
}
Code:
size of char: 1
size of short: 2
size of int: 4
size of long: 4
size of long long: 8
size of float: 4
size of double: 8
size of pointer: 4

Edit:
Even checking the size of int64_t, also shows a size of 8 (from the <cstdint> std library).
 
  • Like
Reactions: mer
@ct58711 - many processors prefer to place 64 bit values on 64 bit aligned memory addresses. Some processors require the value to be at an aligned address.


Data Abort from a lower Exception level.

Used for MMU faults generated by data accesses, alignment faults other than those caused by Stack Pointer misalignment, and synchronous External aborts, including synchronous parity or ECC errors. Not used for debug-related exceptions.

Many compilers will pad a struct that contains a 64 bit field to be a multiple of 64 bits.
 
  • Like
Reactions: mer
Tagged pointers are not supported by lldb, at least last time I looked, which may be a consideration for using them.
 
re: "most significant bit of a pointer as a flag." ?

If you are writing code for other people to maintain, try to read the code from their point of view.
 
Broadly agree with all the responders.
  • You possibly could.
  • I wouldn't.
  • There will definitely be a more elegant way of doing what you want.
 
Pointers use the stack, whereas that "something" is on the heap. Speed therefore would be one answer.
Stack and heap are both in the process image on main memory. Are you saying the stack is more likely to be in the processor's cache?
 
Yes and no, given all things are memory but, in the case of the stack and let's say ARM architecture, there's 3 instructions to get the stack pointer but 4+ for getting it from the heap (the extra is for storing the address found). This is simply because the stack addresses are known, the heap ones are not.
This is where the stack and frame pointers come into play.

So, if the OP is storing the data in the pointer, on the stack, then he has 3+ instructions for that process. If he then uses the pointer to get to the heap to store the data, as suggested, well you can see it's a lot more instructions.

Or am I missing something?
 
It would double the memory consumption of my main integer type (128 bits instead of 64 bits).
So, what is it? An integer or a pointer?

Anyway, I get what you're saying. If you replace a pointer (which is 64 bits and 64-bit aligned on most architectures) with a struct (or object) that contains a pointer and a boolean flag, and you allocate lots of these structs (for example arrays or vectors of them), then the memory consumption will indeed go to 128 bits each. Not because storing a boolean flag requires 64 bits, but because these structs need to be 64-bit aligned.

"Doctor, it hurts when I do this. Well, then stop doing it."

There are so many other ways of doing this, which don't require doubling the memory consumption. First, as unitrunker already said: The pointer points at something. That something probably has room to store an extra bit. But perhaps it also does not. In that case, you can store 64 normal pointers (at 64 bits each, nicely aligned), and separately somewhere you put an unsigned 64-bit int which contains the 64 flags for these 64 pointers. If you are worried about cache smashing, interleave them: Allocate the arrays in units of 65 64-bit words, which always contain 32 pointers, one flag word, and another 32 pointers. Like that the flag words are never more than 256 bytes away from the pointers (memory distance), and you use few cache lines.

If you are worried about memory usage, here is another possible design: Allocate one giant memory arena in which you store things. Make that memory arena 32 GiB in size. Think of that arena as being room for 2^32 64-bit things. Then don't store pointers to the things, instead store the index into the array. By construction, that index fits into an unsigned 32-bit number. Boom, you just halved the memory consumption of your giant array of pointers! But you may say that you're running on a machine with more memory, and you need to address for example 512 GiB instead of 32 GiB in your giant array. No problem, store their addresses as 40-bit numbers. Now mapping 40-bit numbers into memory is not trivial, but it really only takes a handful of instructions to figure out where in a giant array a 40-bit number can be found. I've done this before to store giant arrays of 24-bit quantities; works fine.

By the way, there are even more free bits in pointers: In addition to the high bit (which, as you point out, is never used in user-space addresses), the lowest 3 bits are also always zero, because nearly everything is 64-bit aligned. In theory, you could use those to store a total of 4 flags. And at this point, the plan goes from "bad idea" to "insane idea".
 
  • Like
Reactions: mer
Thank you all for your input.

I see that you are passionate about good programming practices, and that you are trying to find alternative design choices to avoid flags embedded in pointers.

I strongly believe my design is good, efficient, and easily maintainable by other people. To convince you of that, I would have to explain the ins and outs of my design, which is very difficult to do in forum posts. And that was not my point.

The main point of this thread is to improve the knowledge of readers on the layout of the virtual address space of FreeBSD userland processes, for several important target architectures (amd64, aarch64, powerpc64le). Given that knowledge, the readers can then make their own choices.

In addition to the high bit (which, as you point out, is never used in user-space addresses)

This is the information that matters here. I would like to find evidence showing that the most significant bit is never used in user-space addresses, for FreeBSD processes on the three target architectures. This evidence can be a piece of documentation (handbook ...) or the relevant source code file in the FreeBSD source tree. It is possible to run some test program on a Raptor Talos II and a Raspberry Pi, but a test is not as reliable as a handbook or a kernel source code file (and I don't have that kind of hardware at the moment). I am trying to read the FreeBSD source code, maybe I will find the information I seek.

Besides, I think it is interesting in itself to have some background information about the virtual address space layout, even if we choose to avoid tagged pointers.

@ ralphbsz - Yes, you are right, the lowest significant bits of aligned memory addresses are always clear. I won't talk about that here, because I am trying to keep the discussion focused on only one subject (which is the most significant bit).
 
The first Apple Macintosh, which saw the light in 1984, featured a Motorola 68000 processor and 128 kB of RAM. Internally the 68k was already a 32bit processor while only 16 bit were connected to the memory bus. So far so good.

Now, Apple had also the bright idea to use tagged pointers for the Memory Management system. After a little search, I found a PDF file having all three Inside Macintosh volumes of 1985, which were published shortly after the Macs hit the market - 1284 pages. And on page 593 we may read the chapter about the Structure of Master Pointers: https://www.weihenstephan.org/~michaste/pagetable/mac/Inside_Macintosh.pdf#page=593

I worked a lot with that. Actually it worked very well until the Macintosh II, featuring a Motorola 68020, was introduced 3 years later. Then Apple ditched the tagged Master Pointers, and replaced it by fully 32bit plain ones. The former information of the tags where put somewhere into the referenced memory blocks - I don’t remember the very details. I only remember, that It took some hours to update my code, which I had produced until then, to the new Master Pointer model. This was not too difficult, but when looking back, it could well have been like that since the beginning. Anyway, the tagged pointers turned out to be a dead end road.

Why I am telling this? Even if you find out, that tagged pointers are no problem with FreeBSD 13, what about 14, 15, 16, 17, …?

Chances are that it will break at one point in time, and your odds that it may break miserably are worse than that of Apple. They wrote nice Volumes IV, V and VI of Inside Macintosh and asked their Developers to conform to the new rules - for them the problem was solved. You are not in this position, are you?
 
atari had the same problem as above when running atari st (68000) software on atari tt (68030)
mc68000 had only 24 address lines and MSB in addresses was used for 'stuff' or just contained junk
this was causing problems on 68030 with 32 bit lines
some of the problems could be fixed with MMU programming
 
(And now that this thread has digressed into anecdotes about bad things we did in our youth ...)
One of the worst abuses of high bits happened in cp/m, which was an 8-bit operating system for Z80 computers. And the reason was that text strings were universally considered to contain only 7-bit ASCII characters, which leaves one bit free at the top.

General example: When you store text strings in memory, you need some convention for storing the length of the string.
  • Traditionally, this has been done by storing a small integer (8- or 16-bits) first, which gives the length of the string, followed by this many bytes. This convention was used by IBM and Digital. It has the advantage to be fast (the length of a string is known without searching the string), allowing zero-length strings, and allowing any byte in the string (to put it differently, strings can be arbitrary binary arrays). Only disadvantage is that it wastes a little memory, one or two bytes per string.
  • When Dennis and Ken designed C and Unix, they went for the most criminally insane convention, which is to append a zero byte at the end of each string. It is not memory efficient (wastes one byte per string), is not fast (you have to scan the string to find its end), and it doesn't allow for arbitrary binary data in the string. Only good point is that you can have zero-length strings.
  • Some cp/m software used a convention that is a tiny bit better than the C/Unix convention: Allow only 7-bit characters in strings. That's not a big restriction, if you are thinking about printable (human-readable) strings, which really only contain ASCII characters 0x20...0x7E plus control characters 0x01...0x1F plus 0x7F. It is not fast either, you have to scan the string to find its length or end. It does not allow zero-length strings, but for printable strings that hardly ever happens, and if you really need it, just store 0x80 (which can, after stripping the high bit, safely be printed to most devices like terminals or printers). And it is very memory efficient, using no extra bytes.
The more specific example: cp/m's file system used to store directory entries on disk, I think in a 32-byte long directory control block. It contains everything you needed to know about a file: It's "owner" (between 0 and 16), the file name (always 8.3 in uppercase ASCII, not allowing certain special characters which had meaning to the shell), and where it was on disk. When this was designed for cp/m 2.2, that was sufficient. The problem arose in later versions, which added a clock to the computer, and had a timestamp on files. They also started having file attributes, such as read-only, archived, and system only (not visible to normal users). Where to store those attributes and the time stamp? The solution was perverse, but efficient. The attributes were stored mostly by setting high bits on the 11 file name characters. The time stamps were stored by saying that every 4th directory entry wasn't a real file, but contained the time stamp (~8 bytes each) for a group of three directory entries nearby. The fake directory entry was tagged by having an invalid owner (a number greater than 16). By the time all these extra functions had been added for attributes and time stamps, decoding the directory format requires hundreds of lines of assembly code (most code on cp/m was in assembly, for efficiency reasons). What a god-awful mess, but fun for college students who can spend their whole night coding and drinking.
 
No you cannot.

New compilers would flag it as "possible malware". They also disagree (from standard K&R C) on pointer math. You have no way to express such a thing. (despite the fact big tech puts malware in their software using flags to disable malware messages, btw)

you have 0 idea when the pointer might be read and used since your using "a really f'ed up compiler". so yes you could throw a security SEGFAULT by that reasoning

More literally, a pointer is data only if stored so yes, but in ASM (more specifically binary runtime) a stored pointer is loaded elsewhere (perhaps a register) before an instruction uses it MAYBE (depends on compiler) (then, literally, read by CPU and fetched by cpu as part of an single instruction - which instruction depends on which compiler). This is yet another nuance you have no control over - and you can count on cancel culture to have "badly mangled things" in the new "foo war compilers".
 
V8 (the javascript engine) does something similar, but uses the least significant bit(s). This works because every object used in v8 is word-aligned. With this guarantee, depending on the machine's word size, the least significant bit(s) are redundant (and you don't have to rely on anything you know about organization of the virtual address space of your OS).

You can implement that somewhat portably using uintptr_t. (Only "somewhat" because you're restricted to byte-addressable platforms with a word size larger than a byte)
 
Back
Top