Why should string length be plus one its capacity in C?

我的梦境 提交于 2020-06-23 20:27:20

问题


Your string length should be one more than the maximum number of characters you want it to be able to hold. Logical enough: strings are terminated with a NULL character.

It's a very general advice that most newbies get. However, as I grew in programming, now it seems that it's not so correct.

The indexing of any type of array, be it int or char, starts from 0. The maximum index value of most arrays, therefore, is one less than its numerical value. It's same with a string, but since it has an extra character at the end, it gets incremented by one. So, the string length is the same as the number of characters in it.


To see if I'm right, see this snippet:

char str[9];
scanf("%s", str);
printf("%d", strlen(str));

Make this a full-fledged program, and run it. Type 123456789, a guaranteed 9-character long text, and see the results. It could hold the string and sure enough, the string length is 9.


I even witnessed many expert programmers saying that string size should be plus one its capacity. Is this advice largely a myth, or I am going wrong somewhere?

EDIT

Let's say I want to create an integer array Arr that can hold x number of elements. The index value of Arr's last element will be one less than x since index values start from 0 and not 1. So, its length is x-1.

How would you declare it then? I'd do that like this: int Arr[x-1];. I don't think there's any issues with this.

Now if Arr were a char type array (i.e. a string), the length of Arr would be one more than that of its int counterpart since it has an extra NULL character at the end. This will end up as: (x-1)+1=x.

Code to demonstrate this

So why does the declaration this time has to be char Arr[x+1] and not simply char Arr[x]?


回答1:


You're right about the indexing. However:

char str[9];

When you declare a string this way, the number 9 is the array length. Minus the NULL, there can be only 8 characters, not 9. The length of an array is the number of elements in the array, NOT the maximum index value as you think. You're confusing these terms.

Why your program works is already explained by many other answers and even comments.




回答2:


According to the C Standard relative to the description of the conversion specifier s (7.21.6.2 The fscanf function)

s Matches a sequence of non-white-space characters.279) If no l length modifier is present, the corresponding argument shall be a pointer to the initial element of a character array large enough to accept the sequence and a terminating null character, which will be added automatically.

So if to enter the sequence of characters 123456789 then there will be an attempt to write the following characters `

{ '1', '2', '3', '4', '5', '6', '7', '8', '9', '\0' }`

in the array declared like

char str[9];

As it is seen the sequence contains 10 characters while the array can accommodate only 9 characters. So the memory beyond the array will be overwritten and as result the program has undefined behavior.

In C opposite to C++ you can initialize a character array the following way

char str[3] = "Bye";

In this case the terminating zero will not be used as an initializer of the array. That is the array will not contain a string but just characters

{ 'B', 'y', 'e' }

However you may not apply the standard C function strlen to this array because the function counts characters until the terminating zero is encountered and the array does not have such a character.

You should distinguish the value returned by the sizeof operator and the value returned by the standard C function strlen.

For example if you have a declaration like this

char str[10] = "Hello";

then the sizeof operator sizeof( str ) returns 10 that is the array has 10 elements of the size equal 1 (sizeof( char) is always equal to 1).

However if you will apply the standard C function strlen then the returned value will be equal to 5 because the function counts all characters before the terminating zero.

You can write for example

str[8] = 'A';

Nevertheless if ypu apply the function strlen you will again get the value 5 because before the element str[8] with the value 'A' there is a terminating zero.




回答3:


The indexing of any type of array, be it int or char, starts from 0.

Yes, that's true.

All array sizes, therefore, are one less than their numerical values.

No. The first value used for indexing only affects the indexing, not the size. For example, a 1-sized array has just one index, 0. It's the maximum index value that is one less than the size, not the other way around.

In a declaration char str[9]; the value 9 is the array size, not the maximum index value.

The reason that your example seems to work, is that undefined behavior does not have to result in a crash or error message.




回答4:


You are right, that the array index begins at 0, but a char str[9] has a lenght of 9, so the highest index is 8. Your example seems to work, but it could easily create an error. You can also type 1234567890 in your code and it would output 10, because the program can't know the lenght of the array.

When you define that char array, you create a 9 byte space for it on the stack, but when you pass it to scanf the char[] gets converted to a char* a pointer to the first element in the array. So scanf can't know the length of the array and writes the input in the memory, begining at the location where str points to. It writes the \0 character outside of the space that is reserved for the array! But again when pass it to strlen, it can't see the size of the array and continues to scan memory for a \0, which it finds after 10 bytes, so it assumes a length of 10.

Like @Ajay Brahmakshatriya showed in his answer this can lead to errors, because the space outside of the string can be used for another variable, e.g. another string, which then can write different data to the byte where the \0 has been.




回答5:


See this -> Ideone

int main(void) {
    char a[16];
    char b[16];
    scanf("%s",a);
    b[0]='a';
    b[1]='\0';
    printf("%s %d %p %p", a, strlen(a), a, b);
    return 0;  
}

This is an almost replica of the code you showed. For the given input of 16 length (the array size of also 16), the length printed is 17.

Now that we have established that what you said is not correct, we will look at why it printed 9 for you and not in the example I posted.

You created am array of size 9 (allocated 9 bytes). Then you stored 9 bytes of data into it and terminated it by '\0' which wrote on the tenth byte. Since that space was not used by anything (luckily) important, the data fit.

Then when you called strlen, it gave you 9.

Now I made an array of 16 bytes and followed it with another array which is placed after it. Now when it read 16 bytes and terminated it with '\0' it wrote into b.

I overwrote it again by writing to b. The '\0' written by scanf was thus gone.

Then the strlen when counting the length overflowed into b and stopped when it saw the '\0' at b[1].

All of this is ofcourse Undefined Behavior.




回答6:


...So, the string length is the same as the number of characters in it.

This statement is correct, if we see the terminating null not as a character. However, the storage needed to hold the string is one more than the number of characters in it. (The emphasis on 'string' is because string as a data type requires the additional terminating null, which requires storage.)




回答7:


An attempt at proving my point:

Code

#include <stdio.h>
#include <string.h>

int main()
{
    char str[23];
    scanf("%s", str);
    printf("String length = %d\n", strlen(str));
    printf("String element  ---  Index number");
    int index=0;

    while (str[i]!='\0')
    {
        printf("\n%c  ---  %d", str[i], index);
        i++;
    }

    printf("\nNULL  ===  %d", index);

    return 0;
}

Sample Input

graphing

Sample Output

String length = 8
String element  ---  Index number
g  ---  0
r  ---  1
a  ---  2
p  ---  3
h  ---  4
i  ---  5
n  ---  6
g  ---  7
NULL  ===  8


来源:https://stackoverflow.com/questions/43424481/why-should-string-length-be-plus-one-its-capacity-in-c

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!