Case Insensitive String comp in C

前端 未结 11 1168
天涯浪人
天涯浪人 2020-11-27 03:45

I have two postcodes char* that I want to compare, ignoring case. Is there a function to do this?

Or do I have to loop through each use the tolower func

11条回答
  •  一生所求
    2020-11-27 04:05

    I'm not really a fan of the most-upvoted answer here (in part because it seems like it isn't correct since it should continue if it reads a null terminator in either string--but not both strings at once--and it doesn't do this), so I wrote my own.

    This is a direct drop-in replacement for strncmp(), and has been tested with numerous test cases, as shown below.

    It is identical to strncmp() except:

    1. It is case-insensitive.
    2. The behavior is NOT undefined (it is well-defined) if either string is a null ptr. Regular strncmp() has undefined behavior if either string is a null ptr (see: https://en.cppreference.com/w/cpp/string/byte/strncmp).
    3. It returns INT_MIN as a special sentinel error value if either input string is a NULL ptr.

    LIMITATIONS: Note that this code works on the original 7-bit ASCII character set only (decimal values 0 to 127, inclusive), NOT on unicode characters, such as unicode character encodings UTF-8 (the most popular), UTF-16, and UTF-32.

    Here is the code only (no comments):

    int strncmpci(const char * str1, const char * str2, size_t num)
    {
        int ret_code = 0;
        size_t chars_compared = 0;
    
        if (!str1 || !str2)
        {
            ret_code = INT_MIN;
            return ret_code;
        }
    
        while ((*str1 || *str2) && (chars_compared < num))
        {
            ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
            if (ret_code != 0)
            {
                break;
            }
            chars_compared++;
            str1++;
            str2++;
        }
    
        return ret_code;
    }
    

    Fully-commented version:

    /// \brief      Perform a case-insensitive string compare (`strncmp()` case-insensitive) to see
    ///             if two C-strings are equal.
    /// \note       1. Identical to `strncmp()` except:
    ///               1. It is case-insensitive.
    ///               2. The behavior is NOT undefined (it is well-defined) if either string is a null
    ///               ptr. Regular `strncmp()` has undefined behavior if either string is a null ptr
    ///               (see: https://en.cppreference.com/w/cpp/string/byte/strncmp).
    ///               3. It returns `INT_MIN` as a special sentinel value for certain errors.
    ///             - Posted as an answer here: https://stackoverflow.com/a/55293507/4561887.
    ///               - Aided/inspired, in part, by `strcicmp()` here:
    ///                 https://stackoverflow.com/a/5820991/4561887.
    /// \param[in]  str1        C string 1 to be compared.
    /// \param[in]  str2        C string 2 to be compared.
    /// \param[in]  num         max number of chars to compare
    /// \return     A comparison code (identical to `strncmp()`, except with the addition
    ///             of `INT_MIN` as a special sentinel value):
    ///
    ///             INT_MIN (usually -2147483648 for int32_t integers)  Invalid arguments (one or both
    ///                      of the input strings is a NULL pointer).
    ///             <0       The first character that does not match has a lower value in str1 than
    ///                      in str2.
    ///              0       The contents of both strings are equal.
    ///             >0       The first character that does not match has a greater value in str1 than
    ///                      in str2.
    int strncmpci(const char * str1, const char * str2, size_t num)
    {
        int ret_code = 0;
        size_t chars_compared = 0;
    
        // Check for NULL pointers
        if (!str1 || !str2)
        {
            ret_code = INT_MIN;
            return ret_code;
        }
    
        // Continue doing case-insensitive comparisons, one-character-at-a-time, of `str1` to `str2`,
        // as long as at least one of the strings still has more characters in it, and we have
        // not yet compared `num` chars.
        while ((*str1 || *str2) && (chars_compared < num))
        {
            ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
            if (ret_code != 0)
            {
                // The 2 chars just compared don't match
                break;
            }
            chars_compared++;
            str1++;
            str2++;
        }
    
        return ret_code;
    }
    

    Test code:

    Download the entire sample code, with unit tests, from my eRCaGuy_hello_world repository here: "strncmpci.c":

    (this is just a snippet)

    int main()
    {
        printf("-----------------------\n"
               "String Comparison Tests\n"
               "-----------------------\n\n");
    
        int num_failures_expected = 0;
    
        printf("INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!\n");
        EXPECT_EQUALS(strncmpci("hey", "HEY", 3), 'h' - 'H');
        num_failures_expected++;
        printf("------ beginning ------\n\n");
    
    
        const char * str1;
        const char * str2;
        size_t n;
    
        // NULL ptr checks
        EXPECT_EQUALS(strncmpci(NULL, "", 0), INT_MIN);
        EXPECT_EQUALS(strncmpci("", NULL, 0), INT_MIN);
        EXPECT_EQUALS(strncmpci(NULL, NULL, 0), INT_MIN);
        EXPECT_EQUALS(strncmpci(NULL, "", 10), INT_MIN);
        EXPECT_EQUALS(strncmpci("", NULL, 10), INT_MIN);
        EXPECT_EQUALS(strncmpci(NULL, NULL, 10), INT_MIN);
    
        EXPECT_EQUALS(strncmpci("", "", 0), 0);
        EXPECT_EQUALS(strncmp("", "", 0), 0);
    
        str1 = "";
        str2 = "";
        n = 0;
        EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
        EXPECT_EQUALS(strncmp(str1, str2, n), 0);
    
        str1 = "hey";
        str2 = "HEY";
        n = 0;
        EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
        EXPECT_EQUALS(strncmp(str1, str2, n), 0);
    
        str1 = "hey";
        str2 = "HEY";
        n = 3;
        EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
        EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');
    
        str1 = "heY";
        str2 = "HeY";
        n = 3;
        EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
        EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');
    
        str1 = "hey";
        str2 = "HEdY";
        n = 3;
        EXPECT_EQUALS(strncmpci(str1, str2, n), 'y' - 'd');
        EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');
    
        str1 = "heY";
        str2 = "hEYd";
        n = 3;
        EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
        EXPECT_EQUALS(strncmp(str1, str2, n), 'e' - 'E');
    
        str1 = "heY";
        str2 = "heyd";
        n = 6;
        EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
        EXPECT_EQUALS(strncmp(str1, str2, n), 'Y' - 'y');
    
        str1 = "hey";
        str2 = "hey";
        n = 6;
        EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
        EXPECT_EQUALS(strncmp(str1, str2, n), 0);
    
        str1 = "hey";
        str2 = "heyd";
        n = 6;
        EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
        EXPECT_EQUALS(strncmp(str1, str2, n), -'d');
    
        str1 = "hey";
        str2 = "heyd";
        n = 3;
        EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
        EXPECT_EQUALS(strncmp(str1, str2, n), 0);
    
        str1 = "hEY";
        str2 = "heyYOU";
        n = 3;
        EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
        EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');
    
        str1 = "hEY";
        str2 = "heyYOU";
        n = 10;
        EXPECT_EQUALS(strncmpci(str1, str2, n), -'y');
        EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');
    
        str1 = "hEYHowAre";
        str2 = "heyYOU";
        n = 10;
        EXPECT_EQUALS(strncmpci(str1, str2, n), 'h' - 'y');
        EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');
    
        EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 0);
        EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 'n' - 'N');
        EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to meet you.,;", 100), 0);
    
        EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO UEET YOU.,;", 100), 'm' - 'u');
        EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to uEET YOU.,;", 100), 'm' - 'u');
        EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to UEET YOU.,;", 100), 'm' - 'U');
    
        EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 0);
        EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 'n' - 'N');
    
        EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 5), 0);
        EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice eo uEET YOU.,;", 5), 0);
    
        EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 100), 't' - 'e');
        EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice eo uEET YOU.,;", 100), 't' - 'e');
    
        EXPECT_EQUALS(strncmpci("nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');
        EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');
    
    
        if (globals.error_count == num_failures_expected)
        {
            printf(ANSI_COLOR_GRN "All unit tests passed!" ANSI_COLOR_OFF "\n");
        }
        else
        {
            printf(ANSI_COLOR_RED "FAILED UNIT TESTS! NUMBER OF UNEXPECTED FAILURES = %i"
                ANSI_COLOR_OFF "\n", globals.error_count - num_failures_expected);
        }
    
        assert(globals.error_count == num_failures_expected);
        return globals.error_count;
    }
    

    Sample output:

    $ gcc -Wall -Wextra -Werror -ggdb -std=c11 -o ./bin/tmp strncmpci.c && ./bin/tmp
    -----------------------
    String Comparison Tests
    -----------------------
    
    INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!
    FAILED at line 250 in function main! strncmpci("hey", "HEY", 3) != 'h' - 'H'
      a: strncmpci("hey", "HEY", 3) is 0
      b: 'h' - 'H' is 32
    
    ------ beginning ------
    
    All unit tests passed!
    

    References:

    1. This question & other answers here served as inspiration and gave some insight (Case Insensitive String comp in C)
    2. http://www.cplusplus.com/reference/cstring/strncmp/
    3. https://en.wikipedia.org/wiki/ASCII
    4. https://en.cppreference.com/w/c/language/operator_precedence

    Topics to further research

    1. (Note: this is C++, not C) Lowercase of Unicode character
    2. tolower_tests.c on OnlineGDB: https://onlinegdb.com/HyZieXcew

    TODO:

    1. Make a version of this code which also works on Unicode's UTF-8 implementation (character encoding)!

提交回复
热议问题