C string initialization - undersized

I am trying the following problem on FreeBSD 8.1-RELEASE with gcc 4.2.1.

I try to initialize a string, that's correct format according to C guides is:
Code:
char abcd[5] = "abcd";

In general if I initialize a string that is undersized I get a compiler warning:
Code:
char abcd[3] = "abcd";

warning: initializer-string for array of chars is too long

But when I initialize the string taking the '\0' not into consideration:
Code:
char abcd[4] = "abcd";
then I get no warning and it seems to work correctly. Does anyone know the reason why? Is it a gcc bug? Or does gcc automatically allocate space for '\0' in this case?
 
The character sequence is just a sequence of characters. It's not required to be Nul-terminated, like float sequences are not required either.
They are required to be Nul-terminated if you desire to use any str* procedure on it, or any procedure that would call a str* procedure indirectly, like printf(3). Not doing so, the char sequence is just a chunk of memory, and not a Nul-terminated string.

Also, unless you're going to modify them, your strings should be const char *
 
xibo said:
The character sequence is just a sequence of characters. It's not required to be Nul-terminated, like float sequences are not required either.
They are required to be Nul-terminated if you desire to use any str* procedure on it, or any procedure that would call a str* procedure indirectly, like printf(3). Not doing so, the char sequence is just a chunk of memory, and not a Nul-terminated string.

I am not completely sure but as I can understand initializing with double qoutes (") always produces a '\0' terminated string (the problem is that the string can be written to more space than that was originally allocated). In my previous example all were '\0' terminated so could be printed with printf(3) for example.

My question would be why gcc allows the form char abcd[4] = "abcd"; with no warning and why it seems to work as if it was char abcd[5] = "abcd";?
 
izotov said:
My question would be why gcc allows the form char abcd[4] = "abcd"; with no warning and why it seems to work as if it was char abcd[5] = "abcd";?


The char abcd[5] = "abcd"; is not a string, it is an array.
When you create an array, and it is not initialized, the array will be filled random numbers (ie the fact that lying around in memory).
Code:
#define SIZE 3
    ...
    char abcd[SIZE];
    int i = 0;
    for (; i < SIZE; ++i) {
        printf("Code %d symbol: %d", i, (int)abcd[i]);
        if (abcd[i] == '\0') printf(" (it is NULL symbol).\n");
        else printf(".\n");
    }
    ...

Result:
Code:
    Code 0 symbol: 48.
    Code 1 symbol: -38.
    Code 2 symbol: -1.

When we initialize the array, it will be filled with zeros.
Code:
    ...
    char abcd[SIZE] = "";
    ...

Result:
Code:
    Code 0 symbol: 0 (it is NULL symbol).
    Code 1 symbol: 0 (it is NULL symbol).
    Code 2 symbol: 0 (it is NULL symbol).

When we initialize the array some data, the first array address these data, all the rest will be filled with zeros.

Code:
#define SIZE 5
    ...
    char abcd[SIZE] = "abc";
    ...

Result:
Code:
    Code 0 symbol: 97.
    Code 1 symbol: 98.
    Code 2 symbol: 99.
    Code 3 symbol: 0 (it is NULL symbol).
    Code 4 symbol: 0 (it is NULL symbol).

The char abcd[5] = "abcd"; is not a string, it is array. A static array of N elements has the starting address 0 and ending address N-1. Your example is equivalent:
Code:
    char abcd_1[SIZE] = "abc";
    char abcd_2[SIZE] = {'a', 'b', 'c'};
    int abcd_3[SIZE] = {97, 98, 99};

In your example char abcd[5] = "abcd"; a null character at the end, because that is when initializing the entire array is filled with zeros. No, cc does not appends a null character at the end. This is not a string.

The function strlen(3) computes the length of the string is find a zero symbol. We'll see how it works.

Code:
    char abc[5] = "1234";
    printf("Size: %d\n", (int)strlen(abc));

So we have abc[0]-abc [3] == 1 .. 4, and at the end of the array abc[4], we have zero (it is there because when you first initialize the entire array has been filled with zeros).
Result:
Code:
    Size: 4

It's true ... Let's fill the entire array.
Code:
    char abc[5] = "12345";
    printf("Size: %d\n", (int)strlen(abc));

Result:
Code:
    Size: 6

Why 6? Why not 5? Because the array not has a null character at the end, the function of strlen(3) misses and goes into the not initialized memory.

С has not strings, you can use a pointer to a memory. Use a static memory.
Code:
    char *abc = "Hi, I am string!";

Use a dynamic memory. (P.S. See malloc(3)/free(3) and strcpy(3))
Code:
    char *abc = (char*)malloc(20*sizeof(char));
    strcpy(abc, "Hi, I am string!");
    ...
    free(abc);

Use a static buffer.
Code:
    char buf[20];
    char *abc = buf;
    strcpy(abc, "Hi, I am string!");

In the second and third case, you do need to take care that you have enough space for a null character.

P.S. Sorry for my English, I hope it helped for you.
 
I still think it's strange how gcc doesn't at least warn you. To quote K&R:

This conversion is also used by the C language: when a string constant like
"hello\n"
appears in a C program, it is stored as an array of characters containing the
characters in the string and terminated with a '\0' to mark the end.
[h][e][l][l][o][\n][\0]
The %s format specification in printf expects the corresponding argument to be a
string represented in this form. copy also relies on the fact that its input argument
is terminated with a '\0', and copies this character into the output.

Fair enough there is no data type of string in the C language, but as stated, all string constants are terminated with a NULL, and to me, that becomes part of the array, and so a string of x chars has a size of x+1. If the OP were to leave the array size unspecified, it would automatically become [6], not [5]. Surely gcc should warn you that the NULL terminator won't make it into the array, like it does with an overly sized string?
 
andyzammy said:
... Surely gcc should warn you that the NULL terminator won't make it into the array, like it does with an overly sized string?

When you use double quotes for the pseudo dynamic array (char abc[]) or a pointer to a memory (char *abc) compiler automatically appends a null character.
When you have an array of a certain size (char abc[SIZE]) - initialize the array will be initially filled with zeros, and double quotes are not to write a null character (\0) - an array of finite size, and it can not be increased by the compiler.
Example:
Code:
    char abc_a[] = "Hello";  // [H][e][l][l][o][\0] - 6 elements.
    char abc_b[5] = "Hello"; // [H][e][l][l][o]     - 5 elements.
    char abc_c[] = {'H', 'e',
        'l', 'l', 'o'};      // [H][e][l][l][o]     - 5 elements.
    char *abc_d = "Hello";   // [H][e][l][l][o][\0] - 6 elements. Real elements - 5. A null character is not treated.

P.S. Sorry if I do not understand your question correctly.

Added later.
About the double quotes ...
Code:
    char buf[20];
    strcpy(buf, "Hello"); // In the buffer will be copied to
                          // 6 characters:  [H][e][l][l][o][\0].
 
doorways said:
When you use double quotes for the pseudo dynamic array (char abc[]) or a pointer to a memory (char *abc) compiler automatically appends a null character.
This is my point - this is why I assumed that the null character is to be counted as part of the array, and as part of that array, the string "hello" should be considered as 6 chars rather than 5 (assign the "hello" string to a char array of an undefined length and do a sizeof, see what you get. In my opinion the compiler should treat the terminator as a standard part of the constant string. Please explain why this isn't the case if it shouldn't be).

doorways said:
When you have an array of a certain size (char abc[SIZE]) - initialize the array will be initially filled with zeros, and double quotes are not to write a null character (\0) - an array of finite size, and it can not be increased by the compiler.
Example:
Code:
    char abc_b[5] = "Hello"; // [H][e][l][l][o]     - 5 elements.
    char abc_c[] = {'H', 'e',
        'l', 'l', 'o'};      // [H][e][l][l][o]     - 5 elements.

P.S. Sorry if I do not understand your question correctly.

Added later.
About the double quotes ...
Code:
    char buf[20];
    strcpy(buf, "Hello"); // In the buffer will be copied to
                          // 6 characters:  [H][e][l][l][o][\0].
I understand that you can't increase the size of the array.
These two assignments are completely different. The first one is assigning a null terminated string (6 elements, as null is implied), into a 5 char array. The second one is assigning 5 characters to a 5 character array. There is no implication of null termination in the second example.

I understand the usage of strcpy, that stops when it finds a '\0' by design, and in your example, because you used a string constant, it was null terminated.
 
andyzammy said:
This is my point - this is why I assumed that the null character is to be counted as part of the array, and as part of that array, the string "hello" should be considered as 6 chars rather than 5 (assign the "hello" string to a char array of an undefined length and do a sizeof, see what you get. In my opinion the compiler should treat the terminator as a standard part of the constant string. Please explain why this isn't the case if it shouldn't be).

Hm, If we use an array of type char abc[] and use double quotes for initialization:
Code:
    char abc[] = "Hello";
- in the abc array will be written 6 elements: abc[0]=='H', abc[1]=='e', ..., abc[4]=='o', abc[5]=='\0'. Thus the size of an array of 6 elements. Thus, a null symbol is part of the array, verify.:
Code:
    char abc[] = "Hello";
    size_t size = sizeof(abc)/sizeof(abc[0]);
    
    printf("Size: %d\n", (int)size);
Result: Size: 6

Yes, you're right, but I talked about this, too. Sorry, maybe I that is not well understood - my English is very bad :(

andyzammy said:
I understand that you can't increase the size of the array.
These two assignments are completely different. The first one is assigning a null terminated string (6 elements, as null is implied), into a 5 char array. The second one is assigning 5 characters to a 5 character array. There is no implication of null termination in the second example.

When I use the initialization of the array is written with zeros. Not matter as initializing type I used: abc[10] = "Hello" or abc[10] = {'H', 'e', 'l', 'l', 'o'};
Code:
    char abc_b[10] = "Hello";
    char abc_c[10] = {'H', 'e', 'l', 'l', 'o'};
    int i;
    
    
    printf("Result:\nabc_b\tabc_c\n");
    for (i = 0; i < 10; ++i) {
        printf("%5d\t%5d\n", (int)abc_b[i], (int)abc_c[i]);
    }

Result:
Code:
abc_b	abc_c
   72	   72
  101	  101
  108	  108
  108	  108
  111	  111
    0	    0
    0	    0
    0	    0
    0	    0
    0	    0


But if the quotes appends a null character in the following example: char abc[5] = "Hello" - it is dangerous, since a null character is written into the element abc[5] - is beyond the scope of the array.

If I understand you correctly, you claim that it is always when there are double quotes the compiler appends a null character, this is so?
 
doorways said:
If I understand you correctly, you claim that it is always when there are double quotes the compiler appends a null character, this is so?

Well, I'm asking a question more than making a claim - I'm just stating my interpretation of what I've learned so far and hoping someone will correct me. It's clear through these examples above that the compiler won't actually append a null terminator on to a 5 char string going into a 5 char array, but my point is that it should warn that it won't be appended (just as it warns if a 6+ char string going into a 5 char array won't entirely fit).

As for this:
Code:
char abc_b[10] = "Hello";
char abc_c[10] = {'H', 'e', 'l', 'l', 'o'};
I still think they're a completely different way of initializing even though they get the same result. The reason for the same result is due to the generic way partially initialised arrays get padded with nulls.

Bible:
Be careful to distinguish between a character constant and a string that
contains a single character: 'x' is not the same as "x". The former is an integer,
used to produce the numeric value of the letter x in the machine's character set.
The latter is an array of characters that contains one character (the letter x) and a
'\0'.

Baring the above in mind, I would never expect this line of C:
Code:
char abc_c[[B]5[/B]] = {'H', 'e', 'l', 'l', 'o'};
to be null terminated, as you're not asking the program to do that. You're just asking for 5 small ints to be placed in an array (of 'small ints', i.e. char[])

Wheras with this line:
Code:
char abc_b[[B]5[/B]] = "Hello";
You are specifying null termination which, as quoted, turns an x sized char array, into an x+1 sized array. This means the array to be assigned won't fit fully into the array it's set to go into.

This is my interpritation of the way C works, so I would expect a compiler to warn or error on this (what I've taken to be) invalid assignment. Am I incorrect? If so, how should I have interpreted the good book?
 
In C (and all of it's supersets) a "string" is very special because it gets handled by compiler's translator by context.

When used in array initialization it gets scaled to assignees size, when used in pointer initialization (*var, var[] or used outside initializations) it retains full size with a null char at end, because this gets formatted at translation phase the compiler is aware of its size so you can determine their exact memory size anywhere in code with "sizeof".

Here is a simple example:
Code:
    const char test1[4] = "test";
    const char *test2 = test1;
    
    printf("%d %d %d\n", (int)sizeof("test"), (int)sizeof(test1), (int)sizeof(test2));

Output on x86_64 is: 5 4 8

To answer your question assigning a predefined size char array to a larger string constant is perfectly valid but is not a good practice in some cases, thats why you are getting a warning.
 
Here are more examples:

Code:
/*Clean, fast and efficient*/
#define HELLO_STR "hello\n"

void print_hello(void) {
    const char hello[sizeof(HELLO_STR) - 1] = HELLO_STR;

    write(0, (const void*)hello, sizeof(hello);
}

Code:
/*Harder to read and prone to errors, also uses extra byte to store 0*/
#define HELLO_STR "hello\n"

void print_hello(void) {    
    write(0, (const void*)HELLO_STR, 6);
}

Code:
/*Clean but much slower than the previous*/
#define HELLO_STR "hello\n"

void print_hello(void) {    
    write(0, (const void*)HELLO_STR, strlen(HELLO_STR));
}
 
Now I see why there is no '\0' in case of const char hello[5] = "hello";. That is clear.

On the other hand I agree that a warning would be useful as there must be a lot of guys who forget about preparing space for the closing '\0'. Warnings are to avoid errors... Anyways this might be a change request to gcc.
 
izotov said:
Now I see why there is no '\0' in case of const char hello[5] = "hello";. That is clear.

On the other hand I agree that a warning would be useful as there must be a lot of guys who forget about preparing space for the closing '\0'. Warnings are to avoid errors... Anyways this might be a change request to gcc.

No,

Code:
const char hello[5] = "hello";
Is correct and should not give any warning. Look at my first example. Programmer should be able to initialize string constants without trailing 0 for use in low level interfaces efficiently.
 
I don't know how true my observations, but I will express my point of view.

andyzammy said:
Bible:
Be careful to distinguish between a character constant and a string that
contains a single character: 'x' is not the same as "x". The former is an integer,
used to produce the numeric value of the letter x in the machine's character set.
The latter is an array of characters that contains one character (the letter x) and a
'\0'.

Yes it is true, this is what is written in the book of K&R The C Programming Language. But let's look at everything in order, step by step.

1a. These are my thoughts, I'm sorry I have not found confirmation of this in the books. I think that when initializing the type (static size of the array) char abc[5] = "Hi"; - first: in the array will be writen on 5 zeros; second: into an array will be written exactly two characters 'H' and 'i'. Exactly two! With this type of initialization double quotes do not create a null character at the end. That is, it's like: char abc[5] = {'H', 'i'};

My Proof.
Code:
    char abc[2] = "Hi";

Compile it (my work file has name test2.c):
Code:
% cc -Wall -ansi -std=c99 -o test2 test2.c
%
Result - all ok.
Well, let's write the force 3 character!
char abc[2] = "Hi!";
Compile it.
Code:
% cc -Wall -ansi -std=c99 -o test2 test2.c
test2.c: In function 'main':
test2.c:9: warning: initializer-string for array of chars is too long
test2.c:9: warning: unused variable 'abc'
%

So, If double-quotes in the case char abc[2] = "Hi"; returned something like: [H][\0] - the compiler would be cry. So I can assume that for this kind of initialization will be written exactly as many characters as explicitly specify in the double quotation marks:
Code:
    char abc_a[15] = "Hi";   // Exactly 2 characters will be placed in an array.
    char abc_b[5] = "Hello"; // Exactly 5 characters will be placed in an array.
    char abc_c[5] = "";      // Exactly 0 characters will be placed in an array.
                             // Just the array will be filled with zeros.

---
2a. During initialization, the type <any expression> = "String in double quotes"; - Will be written N+1 symbols, where N - number of characters per line.Thus, expression of char abc[] = "Hello"; is equivalent to char abc[] = {'H', 'e', 'l', 'l', 'o', '\0'};

My Proof.
Code:
    char abc[] = "Hello";   // We  written to an array  5 characters!?
                            // Thus we have a starting index abc[0] and a 
                            // ... finite index abc[4].
    abc[5] = 'a';           // But we can use the index at number abc [5] - 
                            // ... this means that the array size - 6 elements.
                            
    printf("abc[5] == %c\n", (char)abc[5]); // It works Oo. 
    
    // Use the index abc[6].
    abc[6] = 'b';           // This is a mistake, but the compiler will not 
                            // ... tell us that we were out of the array!!!
                            
    printf("abc[6] == %c\n", (char)abc[6]); // It works Oo.
This proves that the ANSI C does not follow the boundaries of the array, in contrast to Pascal, about it mentioned in the book of K&R.

But this does not prove that such an initialization is written 6 characters. For this we use the function strlen(3) - it find for a null character as the end of the line. In our case, the null character - it is last element of the array.
Code:
    char abc[] = "Hello";
    printf("Size - %d\n", (int)strlen(abc));

Result: Size - 5. - But do not despair, simply a function of strlen(3) does not account for the null character in the calculation of the string length. Use macro sizeof to find out the number of cells in the array.
Code:
    char abc[] = "Hello";
    printf("Size - %d\n", (int)(sizeof(abc)/sizeof(abc[0])));
Result: Size - 6. - Thus, when the initialization of the form <any expression> = "String in double quotes"; quotes automatically appends a null character.

---
andyzammy said:
Baring the above in mind, I would never expect this line of C:
Code:
char abc_c[5] = {'H', 'e', 'l', 'l', 'o'};
to be null terminated, as you're not asking the program to do that. You're just asking for 5 small ints to be placed in an array (of 'small ints', i.e. char[])

Wheras with this line:
Code:
char abc_b[5] = "Hello";

I say that this is equivalent, but this is not the same. (Here's a tautology :) ).. If I was a compiler, I would have interpreted the recording char abc[5] = "Hi"; as char abc[5] = {'H', 'i'}; I just showing an example of how should be understood it. But I'm not saying that it's the same thing. (Sorry, I do not have a good knowledge of English to explain it in detail).

---
expl said:
When used in array initialization it gets scaled to assignees size, when used in pointer initialization (*var, var[] or used outside initializations) it retains full size with a null char at end, because this gets formatted at translation phase the compiler is aware of its size so you can determine their exact memory size anywhere in code with "sizeof".

Here is a simple example:
Code:
    const char test1[4] = "test";
    const char *test2 = test1;
    
    printf("%d %d %d\n", (int)sizeof("test"), (int)sizeof(test1), (int)sizeof(test2));

Output on x86_64 is: 5 4 8

I agree with you about the scale but did not agree on the use of sizeof macro.
I will explain my position.

1e. Have you written const char test1[4] = "test"; - you have allocated 4 bytes of memory for storing an array (on your platform). No matter what you choose to write const char test1[4] = "Hi"; or const char test1[4] = ""; - It will always be 4 bytes (on your platform). Thus, we can not prove / disprove the fact that double quotes appends / not appends a null character is in this example.

2e. Have you written
Code:
     const char test1[4] = "test";
     const char *test2 = test1;

test2 - this is a pointer of char type to the first address of the memory array where saved test1 array. On your platform, a pointer char type always occupy 8 bytes, even if it points to a string of 100 characters.

3e. Have you written (int)sizeof("test") - It's like in the first case. Equivalent to:
Code:
    const char test[] = "test";
    printf("Size - %d\n", (int)sizeof(test));
Result: Size - 5. This is equivalent to (char){'t', 'e', 's', 't', '\0'} - see item 2a, my thoughts about it.

My Proof.
Code:
    char a_1[4] = "Test",
         a_2[100] = "Test",
         *ap_1 = a_1,
         *ap_2 = a_2,
         b_1[] = "Test",
         b_2[] = {'T', 'e', 's', 't'},
         *bp = "Test";
    
    printf("A category:\n");
    printf("sizeof(a_1) == %d\n", (int)sizeof(a_1));
    printf("sizeof(ap_1) == %d\n", (int)sizeof(ap_1));
    printf("sizeof(a_2) == %d\n", (int)sizeof(a_2));
    printf("sizeof(ap_2) == %d\n", (int)sizeof(ap_2));
    printf("\nB category:\n");
    printf("sizeof(b_1) == %d\n", (int)sizeof(b_1));
    printf("sizeof(b_2) == %d\n", (int)sizeof(b_2));
    printf("sizeof(bp) == %d\n", (int)sizeof(bp));

Result:
Code:
A category:
sizeof(a_1) == 4
sizeof(ap_1) == 8    // The pointer of one type always has the same size.
sizeof(a_2) == 100
sizeof(ap_2) == 8    // The pointer of one type always has the same size.

B category:
sizeof(b_1) == 5
sizeof(b_2) == 4
sizeof(bp) == 8      // The pointer of one type always has the same size.

So, I think, use a sizeof macro this way - not correct.

expl said:
Programmer should be able to initialize string constants without trailing 0 for use in low level interfaces efficiently.

Yes! :)

P.S. Sorry if I'm wrong somewhere, I recently started working for ANSI C. :r
 
@doorways
I agree with you about the scale but did not agree on the use of sizeof macro.

Code:
const char test1[4] = "test";
const char *test2 = test1;
    
printf("%d %d %d\n", (int)sizeof("test"), (int)sizeof(test1), (int)sizeof(test2));

I used this example to demonstrate that sizeof on predefined stack arrays and strings will return they actual size and that when used on dynamic pointers will return size of pointer. I guess you misunderstood what I was trying to show.

Also "sizeof" is not a macro, it's an operator used in the compiler's translator.

No matter what you choose to write const char test1[4] = "Hi"; or const char test1[4] = ""; - It will always be 4 bytes (on your platform).

Why would you ever want to do that? That's just bad programming, you are just wasting memory on your constant.

But If you are using char stack buffers (not constants), you should still avoid using strlen() in C. It's a waste of CPU cycles, most C interfaces support fixed sizes where null termination is irrelevant (like read/write etc).
 
expl said:
I used this example to demonstrate that sizeof on predefined stack arrays and strings will return they actual size and that when used on dynamic pointers will return size of pointer. I guess you misunderstood what I was trying to show.

Oh, now I understand! I'm sorry.

expl said:
Also "sizeof" is not a macro its an operator used in compiler's translator.

Yes, you are right - it is my bad.

expl said:
No matter what you choose to write const char test1[4] = "Hi"; or const char test1[4] = ""; - It will always be 4 bytes (on your platform)
Why would you ever want to do that? Thats just bad programming, you are just wasting memory on your constant.

I do not do this, I'm just explaining my position. I just did not understand from the beginning of your example.

expl said:
But If you are using char stack buffers (not constants), you should still avoid using strlen() in C. Its a waste of CPU cycles, most C interfaces support fixed sizes where null termination is irrelevant(like read/write etc).

Oh, Thanks! Very interesting, I do not think about it.
And how do you check the length of the string, such as the length of the string read from a file?
Code:
  #define BUFSIZE 4048
...
  FILE *fp;
  char str[BUFSIZE];

  if((fp=fopen(argv[1], "r"))==NULL) {
    printf("Can not open file.\n");
    exit(1);
  }

  while(!feof(fp)) {
    if(fgets(str, BUFSIZE, fp)) {
     //...
       // How do I determine the length of the string at this point???
       // I know only one method, but I've recently been using ANSI C.
       size_t size = strlen(str);

     //...
    }
  }

  fclose(fp);
 
Back
Top