strcmp() and signed / unsigned chars

徘徊边缘 提交于 2019-12-19 05:08:44

问题


I am confused by strcmp(), or rather, how it is defined by the standard. Consider comparing two strings where one contains characters outside the ASCII-7 range (0-127).

The C standard defines:

int strcmp(const char *s1, const char *s2);

The strcmp function compares the string pointed to by s1 to the string pointed to by s2.

The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.

The parameters are char *. Not unsigned char *. There is no notion that "comparison should be done as unsigned".

But all the standard libraries I checked consider the "high" character to be just that, higher in value than the ASCII-7 characters.

I understand this is useful and the expected behaviour. I don't want to say the existing implementations are wrong or something. I just want to know, which part in the standard specs have I missed?

int strcmp_default( const char * s1, const char * s2 )
{
    while ( ( *s1 ) && ( *s1 == *s2 ) )
    {
        ++s1;
        ++s2;
    }
    return ( *s1 - *s2 );
}

int strcmp_unsigned( const char * s1, const char *s2 )
{
    unsigned char * p1 = (unsigned char *)s1;
    unsigned char * p2 = (unsigned char *)s2;

    while ( ( *p1 ) && ( *p1 == *p2 ) )
    {
        ++p1;
        ++p2;
    }
    return ( *p1 - *p2 );
}

#include <stdio.h>
#include <string.h>

int main()
{
    char x1[] = "abc";
    char x2[] = "abü";
    printf( "%d\n", strcmp_default( x1, x2 ) );
    printf( "%d\n", strcmp_unsigned( x1, x2 ) );
    printf( "%d\n", strcmp( x1, x2 ) );
    return 0;
}

Output is:

103
-153
-153

回答1:


7.21.4/1 (C99), emphasis is mine:

The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared.

There is something similar in C90.

Note that strcoll() may be more adapted than strcmp() especially if you have character outside the basic character set.



来源:https://stackoverflow.com/questions/1356741/strcmp-and-signed-unsigned-chars

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!