C Why libc is so unsafe?

Hello,

studying libc source code I found that many functions in <string.h> are completely unsafe, there is no check pointer type parameters against NULL value. For example, using strcmp(NULL, NULL) causes Segmentation fault. But why?
 
It's done when it makes sense, e.g. free(NULL) is perfectly fine. WRT strcmp, what do you propose as a return value if one or both of the arguments are NULL?
 
Probably zero for both NULLs, 1 for right NULL, -1 for left or vice versa (not sure it's a good proposal, but... :))

Let's take another functions. memset(void *dst, int v, size_t n)
In case of dst == NULL it will cause Segmentation fault because dst isn't checked against NULL. Why not if (!dst) return NULL;?

In general I do not like the idea libc functions may cause Segmentation fault. But every libc implementation I've checked behaves in such strange manner. Why?! I can't realize...
 
Because passing NULL to memset() would mean programmer (logic) error, and that check would only hide it in most unhelpful way. Same for strcmp() and a lot of other functions.

My take is that if you are passing NULL to memset(), strcmp(), ... something has gone wrong in your code, and it's not "unsafe"'ness of libc functions you need to worry about at that stage.
 
Actually libc is not unsafe. Rather, your program is unsafe if it doesn't check for NULL before passing pointers to libc functions that do not accept NULL, according to the documentation.

The POSIX and ISO C standards (read here for strcmp) clearly specify that strcmp() takes pointers to strings as arguments. NULL is not a pointer to a string, so the behavior is undefined – That means two things: (1) you must not pass NULL to those functions, and (2) SIGSEGV is perfectly valid behavior if you do.
 
For example, using strcmp(NULL, NULL) causes Segmentation fault. But why?

On FreeBSD, that is "safe". De-referencing NULL like this on FreeBSD will consistently crash and let us know that the code is wrong.

That said; the C standard dictates that de-referencing NULL should not necessarily result in a crash (though it is undefined behavior and a crash is valid as an outcome). What has caused it to crash is that the pages in the range 0x00->0xXX have been locked using something like mprotect (or VirtualProtect on Windows). The crash resulting from accessing this memory is defined behavior based on POSIX.

If I recall most platforms do this (especially POSIX). Some older i.e DOS didn't but you can do it manually (http://www.delorie.com/djgpp/doc/libc/libc_588.html).

What I class as "unsafe" and worse than NULL, NULL is:

Code:
  char *a; /* Undefined */
  char *b; /* Undefined */

  strcmp(a, b);

Edit:
Bizarrely some C string handling code is unsafe (i.t toupper) if you use chars rather than int because it can return something outside the range.
 
Some older i.e DOS didn't
Given that 0:0000 is IVT in DOS it would be a problem if you could not dereference (far*)0. :)

I personally don't know about FreeBSD (though I'd assume the same) but not that long ago in Linux you were able to map 0 page and have some fun.
 
...
What I class as "unsafe" and worse than NULL, NULL is:

Code:
  char *a; /* Undefined */
  char *b; /* Undefined */

  strcmp(a, b);
...

Since some years we have the --analyze option on the clang(1) command line. And believe me, you want to use it this and then.
C:
#include <stdio.h>
#include <string.h>

int main(int argc, char *const argv[])
{
   char *a; /* Undefined */
   char *b; /* Undefined */

   printf("Hello, I would crash here, won't I? %d\n", strcmp(a, b));
   return 0;
}

clang --analyze hello.c -o hello.xml
Code:
hello.c:9:55: warning: 1st function call argument is an uninitialized value
   printf("Hello, I would crash here, won't I? %d\n", strcmp(a, b));
                                                      ^~~~~~~~~~~~
1 warning generated.
cat hello.xml
XML:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>clang_version</key>
<string>FreeBSD clang version 6.0.1 (tags/RELEASE_601/final 335540) (based on LLVM 6.0.1)</string>
<key>files</key>
<array>
  <string>hello.c</string>
</array>
<key>diagnostics</key>
<array>
  <dict>
   <key>path</key>
   <array>
    <dict>
     <key>kind</key><string>event</string>
     <key>location</key>
     <dict>
      <key>line</key><integer>6</integer>
      <key>col</key><integer>4</integer>
      <key>file</key><integer>0</integer>
     </dict>
     <key>ranges</key>
     <array>
       <array>
        <dict>
         <key>line</key><integer>6</integer>
         <key>col</key><integer>4</integer>
         <key>file</key><integer>0</integer>
        </dict>
        <dict>
         <key>line</key><integer>6</integer>
         <key>col</key><integer>10</integer>
         <key>file</key><integer>0</integer>
        </dict>
       </array>
     </array>
     <key>depth</key><integer>0</integer>
     <key>extended_message</key>
     <string>&apos;a&apos; declared without an initial value</string>
     <key>message</key>
     <string>&apos;a&apos; declared without an initial value</string>
    </dict>
    <dict>
     <key>kind</key><string>control</string>
     <key>edges</key>
      <array>
       <dict>
        <key>start</key>
         <array>
          <dict>
           <key>line</key><integer>6</integer>
           <key>col</key><integer>4</integer>
           <key>file</key><integer>0</integer>
          </dict>
          <dict>
           <key>line</key><integer>6</integer>
           <key>col</key><integer>7</integer>
           <key>file</key><integer>0</integer>
          </dict>
         </array>
        <key>end</key>
         <array>
          <dict>
           <key>line</key><integer>9</integer>
           <key>col</key><integer>4</integer>
           <key>file</key><integer>0</integer>
          </dict>
          <dict>
           <key>line</key><integer>9</integer>
           <key>col</key><integer>9</integer>
           <key>file</key><integer>0</integer>
          </dict>
         </array>
       </dict>
      </array>
    </dict>
    <dict>
     <key>kind</key><string>control</string>
     <key>edges</key>
      <array>
       <dict>
        <key>start</key>
         <array>
          <dict>
           <key>line</key><integer>9</integer>
           <key>col</key><integer>4</integer>
           <key>file</key><integer>0</integer>
          </dict>
          <dict>
           <key>line</key><integer>9</integer>
           <key>col</key><integer>9</integer>
           <key>file</key><integer>0</integer>
          </dict>
         </array>
        <key>end</key>
         <array>
          <dict>
           <key>line</key><integer>9</integer>
           <key>col</key><integer>55</integer>
           <key>file</key><integer>0</integer>
          </dict>
          <dict>
           <key>line</key><integer>9</integer>
           <key>col</key><integer>60</integer>
           <key>file</key><integer>0</integer>
          </dict>
         </array>
       </dict>
      </array>
    </dict>
    <dict>
     <key>kind</key><string>event</string>
     <key>location</key>
     <dict>
      <key>line</key><integer>9</integer>
      <key>col</key><integer>55</integer>
      <key>file</key><integer>0</integer>
     </dict>
     <key>ranges</key>
     <array>
       <array>
        <dict>
         <key>line</key><integer>9</integer>
         <key>col</key><integer>62</integer>
         <key>file</key><integer>0</integer>
        </dict>
        <dict>
         <key>line</key><integer>9</integer>
         <key>col</key><integer>62</integer>
         <key>file</key><integer>0</integer>
        </dict>
       </array>
     </array>
     <key>depth</key><integer>0</integer>
     <key>extended_message</key>
     <string>1st function call argument is an uninitialized value</string>
     <key>message</key>
     <string>1st function call argument is an uninitialized value</string>
    </dict>
   </array>
   <key>description</key><string>1st function call argument is an uninitialized value</string>
   <key>category</key><string>Logic error</string>
   <key>type</key><string>Uninitialized argument value</string>
   <key>check_name</key><string>core.CallAndMessage</string>
   <!-- This hash is experimental and going to change! -->
   <key>issue_hash_content_of_line_in_context</key><string>98bb79a833b93ac5f7bc0a3d8846b0ce</string>
  <key>issue_context_kind</key><string>function</string>
  <key>issue_context</key><string>main</string>
  <key>issue_hash_function_offset</key><string>4</string>
  <key>location</key>
  <dict>
   <key>line</key><integer>9</integer>
   <key>col</key><integer>55</integer>
   <key>file</key><integer>0</integer>
  </dict>
  </dict>
</array>
</dict>
</plist>

When I initialize *a with NULL, and then issue the same clang --analyze, I see:
Code:
hello.c:9:55: warning: Null pointer argument in call to string comparison function
   printf("Hello, I would crash here, won't I? %d\n", strcmp(a, b));
                                                      ^~~~~~~~~~~~
1 warning generated.

Finally for the curious (*a and *b kept uninitialized):
clang hello.c && ./a.out
Code:
Segmentation fault (core dumped)
 
Could you please elaborate?
Those functions accept an int. They use a lookup table. 8 bit character values above 127 are negative so you get a negative index into the lookup table array. The man page for toupper(3) has this warning: "The argument must be representable as an unsigned char or the value of EOF." If you use getc for fgetc - which return an int - you're okay. But if you use fread or fgets and then pick off individual characters without typecasting as unsigned - bad things happen.
 
The argument must be representable as an unsigned char or the value of EOF.
Exactly, so it's not toupper() being unsafe and rather just following the documentation -- if you don't provide the unsigned char value, you get what you asked for; though no bad things would happen, you'll just get the original character back.
 
Exactly, so it's not toupper() being unsafe and rather just following the documentation

Haha, yes true. You could say that about the entire C language though I suppose ;)

Put it this way. It is a fairly common mistake. So much so that they even added the note to the manpage.
 
Back
Top