C Is this function "pure"?

zirias@

Developer
Consider a function like this:
const char *getSomeValue(const struct some_container *cont, const char *key, size_t *len);
Semantics: Get some named (key) value from some container, optionally also return its length in len. Real-life example why this optional output parameter makes sense would be some utf-8 encoded string with len counting characters instead of octets.

Many compilers support a "pure" attribute. Is it safe to mark this function as pure? Looking for example in GCC's documentation, I find this:
The pure attribute prohibits a function from modifying the state of the program that is observable by means other than inspecting the function’s return value.
Well, not really the case here 🤔 ... but this paragraph describing the purpose of the attribute:
Calls to functions that have no observable effects on the state of the program other than to return a value may lend themselves to optimizations such as common subexpression elimination. Declaring such functions with the pure attribute allows GCC to avoid emitting some calls in repeated invocations of the function with the same argument values.
looks like a match to me. The function clearly is idempotent. Do I just have some edge-case here that wasn't considered writing this GCC documentation?
 
does one key value never change ?
Not sure what you mean, you can query different values of course. If you mean the key of one value stored never changes, then yes.

It's actually just a classic "getter" from some dictionary-like data structure (internally a hash-table, but that shouldn't matter here). So, it would be a typical example for a "pure" function if it wasn't for this additional output parameter. Of course, the len "returned" (by writing to the pointer passed) is also always the same for the same key, so I think it should be a "pure" function, although it doesn't exactly match the description in GCC's documentation 🤔
 
I think you have to take into account that a "pure attribute" has some bearing on a "pure function" (put in an exaggerated way). Is there a precise GCC definition of what its description of a “pure function” is?—as opposed the more or less general definition of a pure function, like here?
That is to say, I think that the GCC descriptions (related to the pure attribute) you mention relate predominantly to an operational aspect of the compiler’s behaviour. Contrast that to the implementation of the sin() function as mentioned in here.

(my emphasis)
Rich (BB code):
So, it would be a typical example for a "pure" function if it wasn't for this additional output parameter.
Why would you think that? I do not see that as a contradiction to this (from Pure function):
In computer programming, a pure function is a function that has the following properties:[1][2]
1. The function return values are identical for identical arguments (no variation with local static variables, non-local variables, mutable reference arguments or input streams).
2. The function application has no side effects (no mutation of local static variables, non-local variables, mutable reference arguments or input/output streams).

You may be referring to/thinking of a pure function as something like a function in a pure(ly) functional language; where a function has as its sole resulting output the value of the function—which, by the way, it (must) also exhibit(s) no side effects.
 
Erichans, if you look at it purely (haha!) on the conceptual level, of course I have a pure function here. It has two output parameters, it's idempotent, it has no side effects. Yes, correct.

But putting the attribute on it is a different story, because the compiler might base optimization decisions on it. And in C, a function only has one (scalar) return value (if you need more, you "fake" them using output parameters).

I'd like to know whether attaching the "pure" attribute to this function would be safe here (IOW, is it "pure" according to the compiler's definition?) -- The GCC documentation can be found here: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html – but other compilers (like e.g. clang) support this attribute as well.
 
it wont work
even without len
char key[10]="bull" => value cow
strcpy(key,"horse")
gcc will return cow
because key is the same (points to same address)
if it expect determinism it should be based on param values because it has no idea of the length of the contents
but then it may ignore pure if args are pointers
 
I guess you're writing something about pointer aliases (which, btw, don't exist in my code), but I really don't get it? :confused:

Ah wait, I guess I get it, you mean passing the same pointer as "key" after modifying the content? Yes, THIS will certainly break with "pure" cause the second call is just optimized away....

2nd edit: have to think whether something like this could be a valid usecase. Feels like a heisenbug crawling up ... 😔
 
The GCC text is specifically referencing functional programming. Your function can't be pure despite being idempotent because the function is not acting on the pointer data you are passing, it's acting on the mutable data contained where the pointers are pointing. Every run of the program will have different pointer data (be at a different address).

There's literally a wikipedia page on this.
 
The GCC text is specifically referencing functional programming. Your function can't be pure despite being idempotent because the function is not acting on the pointer data you are passing, it's acting on the mutable data contained where the pointers are pointing. Every run of the program will have different pointer data (be at a different address).
Oh wait: having different pointers to the same data will never be a problem, it will just look like a different invocation, so the compiler misses a chance for optimizing. But having the same pointer to different data, as covacat outlined, would sure be a problem. This would be a strange use of that function, but it's possible, so I'd better not attach a "pure" attribute to it.

But my question remains, let's just slightly modify the example and make the key an integer (and the container is a private struct, the caller can't modify it):
const char *getSomeValue(const struct some_container *cont, int key, size_t *len);
Is it "pure" that way in the sense of the compiler? Or does the additional output parameter spoil it?

edit: btw, the textbook definition really isn't much help here, GCC's documentation also contains for example this:
pure allows the function to read any non-volatile memory, even if it changes in between successive invocations of the function.
So, modifying the container using some setter function in between calls wouldn't hurt.

edit2: letting that sink, I think even modifying a string key in between calls shouldn't break anything. But my doubt about the output parameter remains...
 
You can't make any function with pointers in the parameters pure. The pointers will have different address values in them.
run 1: x* contains 0x00000009 and points to data containing "abc"
run 2: x* contains 0x00000040 and points to data containing "abc". Yeah, sure "different pointers to the same data isn't a problem" but the point is that on run 2, 0x00000009 will probably not contain "abc" and therefore it can't be pure. 0x00000009 would be the "same pointer to different data."
 
if the memory pointed by container stays the same / append only it has to work;
if elements in container change their relative position during the lifetime of the process then it wont work correctly
 
msplsh please read my entire response first (and maybe also the GCC docs). With pointers, it's a difficult issue. On the conceptual level, this function is "pure" for sure. GCC (and compilers compatible with its attributes) knows two different "levels" of "pureness" for a reason. The really strict variant is called "const". "pure" allows reading (non-volatile) memory that might change in between calls.

Again, my doubt is about the output parameter, nothing else, cause this seems to contradict the GCC docs, but maybe it's just an overlooked edge-case.
 
What other compiler supports this marker? I couldn't find it after poking around in Clang for five minutes.
 
Clang *does* support it (at least if you can trust __has_attribute() checking for availability of GNU-style attributes), but I didn't find it in the docs either...
 
Oh :rolleyes: I think I found a really simple example how an additional output parameter indeed breaks "pure" 😔
Code:
const char *getSomeValue(const struct some_container *cont, const char *key, size_t *len);

size_t len = 0;
const char *val = getSomeValue(c, "bar", &len);
// *val == "baz", len == 3
val = getSomeValue(c, "foo", &len);
// *val == "longer", len == 6
val = getSomeValue(c, "bar", &len);
// *val == "baz", len == 6 ⚡
Or in words: The compiler might choose not to execute the third call, cause
  • all parameters passed are the same as in the first call
  • no non-volatile memory was modified in between (it can't expect the second call to modify len because "pure" forbids that)
  • the result is already known
But then, len won't be updated...

I could take one step further and also add an "access" attribute that marks len as "write_only", so the compiler knows it's an output parameter. But then I'd have to rely on the compiler to understand both attributes. Clang in base, for example, only supports "pure", but not "access". :rolleyes:
 
BTW, sorry to anyone who answered, I guess I should have made the scope of this question more clear in the first place: The GNU-style "pure" attribute supported by GCC and some other C-compilers (including clang).

Talking about the general concept of a "pure" function, there are two completely different answers that are both correct (and both were given here):
  • Technical view: The function can never be pure because it doesn't work on its direct arguments (which are just pointers)
  • Conceptual view (where pointers are just an implementation detail): The function is pure because its results depend on nothing but its parameters and it doesn't have side-effects
But I guess with the scope "pure according to the GNU-style attribute", my initial feeling was correct (which is unfortunate): With the pointer used for an output parameter, it can't be pure. 😔

It looks like the GNU people aimed at supporting the "conceptual view" by explicitly allowing reading of non-volatile memory, but additional output parameters weren't considered. They also have a second attribute named "const" (kind of misleading) for the strict technical view (many functions from <math.h> would qualify, e.g. pow()).
 
Good example and clarifying explanation, thanks. The suggested relation between "pure function" (general term) and the “pure attribute” (as it pertains to GCC) is what I had in mind in my message (especially: “Is there a precise GCC definition of what its description of a “pure function” is”); but I may have stated that not clearly enough. What counts and applies is #4-#7. From general terms to increasingly more specific (and probably more applicable to your GCC example):
  1. general terms “pure function”, … as in mathematics, … as in computer science generally
  2. “pure function” … as in C++, … as in C, … as in Cxx standard, … as in GCC
  3. actual used terminology in other compilers: … as resembling (to “pure attributes”) terms in llvm/Clang
  4. actual used terminology: … as “pure attribute” in GCC
  5. definition specifications used in your GCC version: only “pure attribute”
  6. operational aspects: correctness of “pure attribute”, specifically for your use case
  7. operational aspects: impact on execution speed of “pure attribute”, specifically for your use case

Ad #4
I agree with your doubts (stated in your earlier messages) about the output parameter on the basis of the GCC text:
From the GCC docs:
pure

Calls to functions that have no observable effects on the state of the program other than to return a value may lend themselves to optimizations such as common subexpression elimination. Declaring such functions with the pure attribute allows GCC to avoid emitting some calls in repeated invocations of the function with the same argument values.
Taking this part of the "defining" text literally means: your output parameter constitutes something additional to the "return a value" (note: singular), that seems to refer to only the function result.

The following expands, to a certain degree, on its definition:
The pure attribute prohibits a function from modifying the state of the program that is observable by means other than inspecting the function’s return value. However, functions declared with the pure attribute can safely read any non-volatile objects, and modify the value of objects in a way that does not affect their return value or the observable state of the program.

What follows after that should be treated as an example, not part of a strict definition. Note that only there the term "pure function" is colloquially described (not defined). Noting:
Because a pure function cannot have any observable side effects it does not make sense for such a function to return void.
hints that your function (with an output parameter) should not be used with the pure attribute. However that may reflect only on the preceding sentence where the const and pure attribute are used at the same time.


Ad #2-#6 (broadly)
There are other references that introduce the concept of a “pure function” as it relates (more or less) to its mathematical origins. IMO, one has to be very careful in the distinction between the term “pure function” (and its very diverse interpretations in different domains) and the actual specified and precisely defined terms. The “pure attribute” in GCC is a compiler hint: GCC decides if it acts on it; it may even depend on the optimization level. The latter means also that, depending on the optimization level it may or may not correctly calculate the value(s) of the function targeted by the “pure attribute”. You, as the developer, has to guard that the function is actually eligible to have a “pure attribute”: the (GCC) compiler does not/cannot enforce general correctness. Maybe helpfull on the topic of const and pure attributes:
Note that most concentrate on C++ and its impact on code from the point of view of that language. (I presume you are mainly referring to C because, if C++, you probably would have used ref-s instead of *pointers as function parameters.)


Ad #3
Clang *does* support it (at least if you can trust __has_attribute() checking for availability of GNU-style attributes), but I didn't find it in the docs either...
The only reference I could find is in LLVM 2.4 Release Notes - Optimizer Improvements:
The new AddReadAttrs pass works out which functions are read-only or read-none (these correspond to 'pure' and 'const' in GCC) and marks them with the appropriate attribute.
That is from an old release; the pure and const attribute seem to originate from quite a long time in the past. Note the difference in activation: this seems "automatically" applied by the compiler during an optimization pass, whereas GCC opts for a (possible) activation by specified function attributes in the source text.


Ad #7 (mostly)
There have been other endeavours into the usefulness of using a “pure functions” in a C compiler (different from the current GCC pure attribute):

___
Edit: the first list had a duplicate in it, sorry for the possible confusion. I hope now to have corrected it and all its references.
 
This is pretty much why the "rules" say no pointers, because it's too much complexity to keep track of in C/C++. You can achieve it in other languages because of their guardrails.
 
This is pretty much why the "rules" say no pointers, because it's too much complexity to keep track of in C/C++. You can achieve it in other languages because of their guardrails.

[...]
edit: btw, the textbook definition really isn't much help here, GCC's documentation also contains for example this:
pure allows the function to read any non-volatile memory, even if it changes in between successive invocations of the function.

Oh :rolleyes: I think I found a really simple example how an additional output parameter indeed breaks "pure" 😔
Code:
const char *getSomeValue(const struct some_container *cont, const char *key, size_t *len);

size_t len = 0;
const char *val = getSomeValue(c, "bar", &len);
// *val == "baz", len == 3
val = getSomeValue(c, "foo", &len);
// *val == "longer", len == 6
val = getSomeValue(c, "bar", &len);
// *val == "baz", len == 6 ⚡
Or in words: The compiler might choose not to execute the third call, cause
  • all parameters passed are the same as in the first call
  • no non-volatile memory was modified in between (it can't expect the second call to modify len because "pure" forbids that)
  • the result is already known
But then, len won't be updated...

The "[...] allows the function to read any non-volatile memory" may lead to the conclusion that there is no "pure function" anymore: that is correct(*). You might think that there is some form of contradiction where there isn't: the compiler can now easily infer that, for example, the optimization of not executing the code of the function in the last call (in the quoted code) cannot be guaranteed to give correct results: it will therefore reject such a possible optimization and effectively ignore the pure attribute.

___
(*) note however that in the context of GCC we are speaking only of the pure attribute.
 
The "[...] allows the function to read any non-volatile memory" may lead to the conclusion that there is no "pure function" anymore: that is correct(*).
I'd add yet another remark now: on the technical level. On a conceptual (mathematical) level, the pointers are just implementation details and we have a function taking some container object and some string key, and returning some string value and some length value (2 inputs, 2 outputs) which is clearly pure :)

The aim of this GNU-attribute really seems to be to get in line with the conceptual view as much as possible/feasible. The complexity is in tracking the actual argument values if they "hide" behind pointers, so instead, just reading any non-volatile memory is allowed (therefore, if the compiler sees any mutation of memory between the two calls, nothing is optimized). The case of using pointers to simulate additional return values isn't even considered, so I'm now pretty sure I can't safely use this attribute on my example function.
the compiler can now easily infer that, for example, the optimization of not executing the code of the function in the last call (in the quoted code) cannot be guaranteed to give correct results:
except in this specific example, it can't – it doesn't see any modification to len. Modifying it is technically a side-effect which isn't allowed in presence of the "pure" attribute.
[Clang] That is from an old release; the pure and const attribute seem to originate from quite a long time in the past. Note the difference in activation: this seems "automatically" applied by the compiler during an optimization pass, whereas GCC opts for a (possible) activation by specified function attributes in the source text.
Well first, I'm pretty sure clang supports these attributes: __has_attribute (pure) evaluates to true. It's probably just lacking documentation.

Also, I have a different interpretation here: It makes sense for an optimizer to analyze a function that way, probably both compilers do. But what if the function isn't defined in the current compilation unit? Nowadays, LTO might help with that, but what if you don't use it, or the function is defined in a shared lib, or in a static lib that was once compiled without LTO? I assume that's the purpose of the attribute: Annotate things in the declaration the compiler can't know otherwise.

BTW thanks for this longer answer with many links, will need some more time to have a look at all of them. Indeed, I hate C++ with a passion (only use it when I need to, e.g. for creating a Qt UI). But when it comes to these attributes, I wouldn't expect too many differences between C and C++.
 
The whole point of forbidding this on a technical level is that there's no way in the language to indicate what you've done on a conceptual level. This is a new attribute and GCC has clearly learned it's useful on some level, but the technical limitations of the language prevent it from being deployed in the way you think you "should" be able to deploy it because it "theoretically" meets the requirements.

If you wanted to futz about why it should be possible, then wonk about on the mailing list where they're working on it or submit a patch, sheesh 🤷‍♂️ No pointers means no pointers.
 
Please msplsh read the docs first. Pointers are allowed with the "pure" attribute (as opposed to the "const" attribute)
 
Ok, whatever. It basically says the same thing. You're modifying the contents of the pointers you pass in which is pretty much not allowed. Again, theoretical ("it's only a return value") versus technical (ie, from the documentation: "must not modify the array [or in this case, the integer pointer] it points to").
 
I don't know where you're taking that text from, not from the GCC docs I know. It doesn't even talk about pointers.

But anyways, if you drop this "pointer issue", you're just confirming what I concluded in post #15. An "output parameter" breaks guarantees this attribute is supposed to grant.
 
If it is only the return value that is the problem, you can return a struct containing two values. This would satisfy the one-return-value-only requirement. Sounds like cheating, but hey - if it works...
 
Back
Top