C strcmp in the C standard?

FreeBSD’s libc implementation of strcmp() appears to find the first two characters that differ, and return the difference between them.
Whereas valgrind’s strcmp() only returns a value from the set {-1,0,1}
I don’t have a Linux system to compare as I took my Linux PC to bits.
What does the C Standard say about this? I don’t own a copy.
I just (perhaps stupidly) wrote a function that relied on FreeBSD’s behaviour and had to spend about an hour investigating why my program fails under valgrind!
 
"The C Programming Language" by Kernighan and Ritchie, copyright 1978 by Bell Labs:
"strcmp(s, t) which compares the character strings s and t and returns negative, zero or positive according as s is lexographically less than, equal to or greater than t. The value returned is obtained by subtracting the characters at the first position where s and t disagree"

So, I'm going to take K&R as authoritative and strcmp should be returning the difference of the characters, not "-1, 0, +1".
Every use I've seen of strcmp has been along the lines of
if (strcmp(s, t) < 0) then
s is less than t
else if (strcmp(s, t) > 0) then
s is greater than t
else
s equals t
endif

I've been inclined to take K&R as authoritative on the C library until someone points at a RFC or standard saying otherwise.
 
I've never read the standard, but i thought the return value was always like this
if Return value < 0 then it indicates str1 is less than str2.

if Return value > 0 then it indicates str2 is less than str1.

if Return value = 0 then it indicates str1 is equal to str2.
 
So POSIX (thanks _martin) says:

The strcmp() function shall compare the string pointed to by s1 to the string pointed to by s2.
The sign of a non-zero return value shall be determined by the sign of the difference between the values of the first pair of bytes (both interpreted as type unsigned char) that differ in the strings being compared.

RETURN VALUE​

Upon completion, strcmp() shall return an integer greater than, equal to, or less than 0, if the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2, respectively.
In other words it agrees with valgrind’s implementation and you can only depend on the sign.
But K&R (thanks mer) says you get a difference of characters... and they should surely know.
Even more confused now.
I don’t suppose it matters in the big scheme of things. I’ll have to just write code that only relies on the sign...
 
Incidentally I think the FreeBSD way of doing it should use fewer CPU instructions: a subtraction versus a subtraction followed by (potentially) two comparisons and (potentially) two jumps.
 
Yes but after doing all the comparisons, the valgrind version has additionally to clamp the result to one of {-1,0,1} which presumably will require an if statement. The FreeBSD version just returns the difference.
 
https://en.cppreference.com/w/c/string/byte/strcmp is fairly authoritative. The ISO C99 standard says

Returns
3 The strcmp function returns an integer greater than, equal to, or less than
zero, accordingly as the string pointed to by s1 is greater than, equal to,
or less than the string pointed to by s2.

I'm reluctant to change Valgrind as the current implementation is conformant.

If you really need it you could modify Valgrind yourselt, in shared/vg_replace_strmem.c

C:
#define STRCMP(soname, fnname) \
   int VG_REPLACE_FUNCTION_EZU(20160,soname,fnname) \
          ( const char* s1, const char* s2 ); \
   int VG_REPLACE_FUNCTION_EZU(20160,soname,fnname) \
          ( const char* s1, const char* s2 ) \
   { \
      register UChar c1; \
      register UChar c2; \
      while (True) { \
         c1 = *(const UChar *)s1; \
         c2 = *(const UChar *)s2; \
         if (c1 != c2) break; \
         if (c1 == 0) break; \
         s1++; s2++; \
      } \
      if ((UChar)c1 < (UChar)c2) return -1; \
      if ((UChar)c1 > (UChar)c2) return 1; \
      return 0; \
   }

You would need to change return -1 and return 1.
 
From the C11 standard, "§7.24.4.2 The strcmp function", paragraph 3:
The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.

So yes, any implementation returning values <0, 0 and >0 complies to the C standard, and the caller should never assume any specific values.
 
  • Like
Reactions: mer
So the Posix statement is the same as K&R, so valgrind is actually not following Posix spec.
That could be due to valgrind wanting the stricter values because of valgrind itself, in the grand scheme of things I don't think I've seen code depending on the actual "value" of the return, just the sign. That doesn't mean there is no code, but just that I've not seen any.
 
So the Posix statement is the same as K&R, so valgrind is actually not following Posix spec.
The POSIX spec quoted above "The sign of a non-zero return value shall be determined by the sign of the difference between the values […]" also implies that the actual value doesn't matter, only the sign. So it's basically the same as the C standard, just worded differently. (*)

Only K&R says the actual difference between the first different characters found should be returned. It's IMHO unclear for the edge case where the strings differ in lentgh (or it just implies to consider the NUL terminator as part of the string content). But it's definitely deprecated and superseded by the ISO C standard documents. And if you think about it, it's ambiguous anyways, as C doesn't imply any specific character encoding.

So, in a nutshell, never ever write code relying on specific values returned by strcmp().

(*) well, not entirely. Wording it like POSIX only works if a character encoding is assumed where all alphabetic characters are ordered. IIRC, POSIX mandates ASCII or compatible extensions, so this wording is fine for POSIX, but wouldn't be fine for the C standard, which only requires the encoding of digits to be ordered (and consecutive).
 
zirias@ Yes you are "more correct" on the wording of the Posix spec. It says it computes the difference to determine the sign, but it does not say that it actually returns that difference.

So, in a nutshell, never ever write code relying on specific values returned by strcmp().
That is true; and I don't I've ever seen any code actually depending on anything besides the sign of the return.
 
mer I just tried to add some reasoning to the specs. Once you realize C imposes very little restrictions on the character encoding used, it gets pretty obvious ;)

As for the OP, it's most likely a good thing valgrind's implementation "normalizes" the return values. It helped you find a portability bug in your code ;)
 
  • Like
Reactions: mer
I ended up writing my own function that does the thing I want (compares the strings and returns the difference in the characters).
 
The POSIX spec quoted above "The sign of a non-zero return value shall be determined by the sign of the difference between the values […]" also implies that the actual value doesn't matter, only the sign. So it's basically the same as the C standard, just worded differently. (*)

More than that. POSIX.1-2017 says.

The functionality described on this reference page is aligned with the ISO C standard. Any conflict between the requirements described here and the ISO C standard is unintentional. This volume of POSIX.1-2017 defers to the ISO C standard.

So even if there is a difference then it is ISO C that is the reference, not POSIX.
 
  • Like
Reactions: mer
Paul Floyd I read this sentence as a counter measure for the possible case that some POSIX spec would indeed contradict the C spec (which would be unintentional). It doesn't mean POSIX can't introduce restrictions and guarantees beyond ISO C (and POSIX does exactly that in some areas, e.g. the C/POSIX locale is required to use ASCII).

As another example, POSIX could specify that the actual difference between the first differing characters is returned here. With ASCII as the 8bit character encoding, this wouldn't contradict ISO C (where the value doesn't matter, only the sign). Still, it doesn't do that. It would probably serve no real purpose...

OTOH, if it wasn't for the ASCII requirement, the wording of POSIX concerning strcmp() quoted above would contradict ISO C, as ISO C doesn't require lexicographic ordering in the character encoding.

This thread starts to remind me of one of these "language lawyer" discussions sometimes seen on stackoverflow :cool:
 
As a side note, some may find it interesting I was looking through an old book on FORTH from my shelf. There is a FORTH Reference word, "-TEXT" that is basically strcmp(). The definition and implementation of the FORTH word is the same as the K&R definition of strcmp(). "Returns the difference of the first character that is different in the strings".
 
Back
Top