What if a null character is present in the middle of a string?

烈酒焚心 提交于 2021-02-18 16:57:46

问题


I understand that the end of a string is indicated by a null character, but i cannot understand the output of the following code.

#include <stdio.h>
#include <string.h>

int
main(void)
{
    char s[] = "Hello\0Hi";
    printf("%d %d", strlen(s), sizeof(s));
}

OUTPUT: 5 9

If strlen() detects the end of the string at the end of o, then why doesn't sizeof() do the same thing? Even if it doesn't do the same thing, isn't '\0' A null character (i.e, only one character), so shouldn't the answer be 8?


回答1:


The sizeof operator does not give you the length of a string but instead the size of the type of it's operand. Since in your code the operand is an array, sizeof is giving you the size of the array including both null characters.

If it were like this

const char *string = "This is a large text\0This is another string";
printf("%zu %zu\n", strlen(string), sizeof(string));

the result will be very different because string is a pointer and not an array.

Note: Use the "%zu" specifier for size_t which is what strlen() returns, and is the type of the value given by sizeof.




回答2:


strlen() doesn't care about the actual size of the string. It looks for a null byte and stops when it sees the first null byte.

But sizeof() operator knows the total size. It doesn't care about what bytes you are in the string literal. You might as well have all null bytes in the string and sizeof() would still give the correct size of the array (strlen() would retrun 0 in that case).

They are not comparable; they do different things.




回答3:


If strlen() detects the end of the string at the end of o, then why doesn't sizeof() do the same thing?

strlen only works for string (character array), whereas sizeof works for every data type. sizeof calculates the exact memory spaces for any given data type; whereas strlen provides the length of a string (NOT including the NULL terminator \0). So in normal cases, this is true for a typical character array s:

char s[] = "Hello";
strlen( s ) + 1  = sizeof( s ); // +1 for the \0

In your case it's different because you have a NULL terminator in the middle of character array s:

char s[] = "Hello\0Hi";

Here, strlen would detect the first \0 and gives the length as 5. The sizeof, however, will calculate the total number of spaces enough to hold the character arrays, including two \0, so that's why it gives 9 as the second output.




回答4:


strlen() computes the length of the string. This is done by returning the amount of characters before (and not including) the '\0' character. (See the manual page below.)

sizeof() returns the amount of bytes of the given variable (or data-type). Note that your example "Hello\0Hi" has 9 characters. But you don't seem to understand where character 9 comes from in your question. Let me explain the given string first. Your example string is:

"Hello\0Hi"

This can be written as the following array:

['H', 'e', 'l', 'l', 'o', '\0', 'H', 'i', '\0']

Note the last '\0' character. When using the string quotes the compiler ends the string with an '\0' character. This means "" also is ['\0'] and thus has 1 element.

BEWARE that sizeof() does NOT return the number of elements in the array. It returns the amount of bytes. char is 1 byte and therefor sizeof() does returns the number of elements. But if you used any other datatype, for example if you would call sizeof() on [1, 2, 3, 4] it would return 16. Since int is 4 bytes and the array has 4 elements.

BEWARE that passing an array as parameter will only passes the pointer. If you would pass s to another function and call sizeof() it will return the size of the pointer, which is the same as sizeof(void *). This is a fixed length independent from the array.

STRLEN(3)                BSD Library Functions Manual                STRLEN(3)

NAME
     strlen, strnlen -- find length of string

LIBRARY
     Standard C Library (libc, -lc)

SYNOPSIS
     #include <string.h>

     size_t
     strlen(const char *s);

     size_t
     strnlen(const char *s, size_t maxlen);

DESCRIPTION
     The strlen() function computes the length of the string s.  The strnlen()
     function attempts to compute the length of s, but never scans beyond the
     first maxlen bytes of s.

RETURN VALUES
     The strlen() function returns the number of characters that precede the
     terminating NUL character.  The strnlen() function returns either the
     same result as strlen() or maxlen, whichever is smaller.

SEE ALSO
     string(3), wcslen(3), wcswidth(3)

STANDARDS
     The strlen() function conforms to ISO/IEC 9899:1990 (``ISO C90'').
     The strnlen() function conforms to IEEE Std 1003.1-2008 (``POSIX.1'').

BSD                            February 28, 2009                           BSD



回答5:


As name literal itself implies string literal is a sequence of characters enclosed in double quotes. Implicitly this sequence of characters is appended by a terminating zero.

So any character enclosed in the double quotes is a part of the string literal.

When a string literal is used to initialize a character array all its characters including the terminating zero serve as initializers of the corresponding elements of the character array.

Each string literal in turn has type of a character array.

For example this string literal "Hello\0Hi" in C has type char[9]: 8 characters enclosed in the quotes plus the implicit terminating zero.

So in memory this string literal is stored like

{ 'H', 'e', 'l', 'l', 'o', '\0', 'H', 'i', '\0' }

Operator sizeof returns the number of bytes occupied by an object. So for the string literal above the operator sizeof will return value 9- it is the number of bytes occupied by the literal in memory.

If you wrote "Hello\0Hi" then the compiler may not itself just remove this part Hi from the literal. It has to store it in memory along with other characters of the literal enclosed in quotes.

The sizeof operator returns the size in bytes of any object in C not only of character arrays.

In general character arrays can store any raw data for example some binary data read from a binary file. In this case this data is not considered by the user and by the program like strings and as result are processed differently than strings.

Standard C function strlen is specially written for character arrays that to find the length of a stored string in a character array. It does not know what data are stored in an array and how they were written in it. All what it does is searches the first zero character in a character array and returns the number of characters in the character array before the zero character.

You can store in one character array several strings sequentially. For example

char s[12];

strcpy( s, "Hello" );
strcpy( s + sizeof( "Hello" ), "World" );

puts( s ); // outputs "Hello"
puts( s + sizeof( "Hello" ) ); // outputs "World"

If you would define a two dimensional array like this

char t[2][6] = { "Hello", "World" };

then in memory it will be stored the same way as the one-dimensional array above. So you can write

char *s = ( char * )t;

puts( s ); // outputs "Hello"
puts( s + sizeof( "Hello" ) ); // outputs "World"

Another example. Standard C function strtok can split one string stored in a character array to several strings substituting the specified by the user delimiters with zero bytes. As result the character array will contain several strings.

For example

char s[] = "Hello World";

printf( "%zu\n", sizeof( s ) ); // outputs 12

strtok( s, " " );

puts( s ); // outputs "Hello"
puts( s + sizeof( "Hello" ) ); // outputs "World"

printf( "%zu\n", sizeof( s ) ); // outputs 12

The last printf statement will output the same value equal to 12 because the array occupies the same number of bytes. Simply one byte in the memory allocated for the array was changed from ' ' to '\0'.




回答6:


Character arrays in C and pointers to character array are not same thing. Though you can print addresses and get same value. An array in C is made up of following things.

  1. Size of array
  2. Its address / pointer
  3. Homogenous Type of elements

Where a pointer is made up of just:

  1. Address
  2. Type information

    char s[] = "Hello\0Hi"; printf("%d %d", strlen(s), sizeof(s));

Here you are calculating the size of array (which is s variable) using sizeof() which is 9.

But if you treat this character array as string than array(string now) looses its size information and become just a pointer to a character. Same thing happens when you try to print character array using %s.

So strlen() and %s treat character array as string and it utilize its address information only. You can guess, strlen() keep incrementing the pointer to calculate the length up-to first null character. When it encounter a null character you get a length up-to that point.

So the strlen() gives you 5 and do not count null character.

So sizeof() operator tells only the size of its operand. If you give it array variable than it utilize the array size information and tells the size regardless of null character position.

But if you give sizeof() the pointer to array of characters than it finds pointer without the size information and prints the size of pointer which is usually 64bit/8byte on 64bit systems or 32bit/4bytes on 32bit systems.

One more thing if you initialize your character arrays using double quotes like "Hello" than C adds a null character otherwise it does not in case of {'H','e','l','l','o'}.

Using gcc compiler. Hope it will help only to understand.



来源:https://stackoverflow.com/questions/34990187/what-if-a-null-character-is-present-in-the-middle-of-a-string

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!