How to convert std::string to lower case?

后端 未结 26 2045
旧时难觅i
旧时难觅i 2020-11-22 00:01

I want to convert a std::string to lowercase. I am aware of the function tolower(), however in the past I have had issues with this function and it

相关标签:
26条回答
  • 2020-11-22 00:27

    tl;dr

    Use the ICU library. If you don't, your conversion routine will break silently on cases you are probably not even aware of existing.


    First you have to answer a question: What is the encoding of your std::string? Is it ISO-8859-1? Or perhaps ISO-8859-8? Or Windows Codepage 1252? Does whatever you're using to convert upper-to-lowercase know that? (Or does it fail miserably for characters over 0x7f?)

    If you are using UTF-8 (the only sane choice among the 8-bit encodings) with std::string as container, you are already deceiving yourself if you believe you are still in control of things. You are storing a multibyte character sequence in a container that is not aware of the multibyte concept, and neither are most of the operations you can perform on it! Even something as simple as .substr() could result in invalid (sub-) strings because you split in the middle of a multibyte sequence.

    As soon as you try something like std::toupper( 'ß' ), or std::tolower( 'Σ' ) in any encoding, you are in trouble. Because 1), the standard only ever operates on one character at a time, so it simply cannot turn ß into SS as would be correct. And 2), the standard only ever operates on one character at a time, so it cannot decide whether Σ is in the middle of a word (where σ would be correct), or at the end (ς). Another example would be std::tolower( 'I' ), which should yield different results depending on the locale -- virtually everywhere you would expect i, but in Turkey ı (LATIN SMALL LETTER DOTLESS I) is the correct answer (which, again, is more than one byte in UTF-8 encoding).

    So, any case conversion that works on a character at a time, or worse, a byte at a time, is broken by design. This includes all the std:: variants in existence at this time.

    Then there is the point that the standard library, for what it is capable of doing, is depending on which locales are supported on the machine your software is running on... and what do you do if your target locale is among the not supported on your client's machine?

    So what you are really looking for is a string class that is capable of dealing with all this correctly, and that is not any of the std::basic_string<> variants.

    (C++11 note: std::u16string and std::u32string are better, but still not perfect. C++20 brought std::u8string, but all these do is specify the encoding. In many other respects they still remain ignorant of Unicode mechanics, like normalization, collation, ...)

    While Boost looks nice, API wise, Boost.Locale is basically a wrapper around ICU. If Boost is compiled with ICU support... if it isn't, Boost.Locale is limited to the locale support compiled for the standard library.

    And believe me, getting Boost to compile with ICU can be a real pain sometimes. (There are no pre-compiled binaries for Windows that include ICU, so you'd have to supply them together with your application, and that opens a whole new can of worms...)

    So personally I would recommend getting full Unicode support straight from the horse's mouth and using the ICU library directly:

    #include <unicode/unistr.h>
    #include <unicode/ustream.h>
    #include <unicode/locid.h>
    
    #include <iostream>
    
    int main()
    {
        /*                          "Odysseus" */
        char const * someString = u8"ΟΔΥΣΣΕΥΣ";
        icu::UnicodeString someUString( someString, "UTF-8" );
        // Setting the locale explicitly here for completeness.
        // Usually you would use the user-specified system locale,
        // which *does* make a difference (see ı vs. i above).
        std::cout << someUString.toLower( "el_GR" ) << "\n";
        std::cout << someUString.toUpper( "el_GR" ) << "\n";
        return 0;
    }
    

    Compile (with G++ in this example):

    g++ -Wall example.cpp -licuuc -licuio
    

    This gives:

    ὀδυσσεύς
    

    Note that the Σ<->σ conversion in the middle of the word, and the Σ<->ς conversion at the end of the word. No <algorithm>-based solution can give you that.

    0 讨论(0)
  • 2020-11-22 00:27

    There is a way to convert upper case to lower WITHOUT doing if tests, and it's pretty straight-forward. The isupper() function/macro's use of clocale.h should take care of problems relating to your location, but if not, you can always tweak the UtoL[] to your heart's content.

    Given that C's characters are really just 8-bit ints (ignoring the wide character sets for the moment) you can create a 256 byte array holding an alternative set of characters, and in the conversion function use the chars in your string as subscripts into the conversion array.

    Instead of a 1-for-1 mapping though, give the upper-case array members the BYTE int values for the lower-case characters. You may find islower() and isupper() useful here.

    enter image description here

    The code looks like this...

    #include <clocale>
    static char UtoL[256];
    // ----------------------------------------------------------------------------
    void InitUtoLMap()  {
        for (int i = 0; i < sizeof(UtoL); i++)  {
            if (isupper(i)) {
                UtoL[i] = (char)(i + 32);
            }   else    {
                UtoL[i] = i;
            }
        }
    }
    // ----------------------------------------------------------------------------
    char *LowerStr(char *szMyStr) {
        char *p = szMyStr;
        // do conversion in-place so as not to require a destination buffer
        while (*p) {        // szMyStr must be null-terminated
            *p = UtoL[*p];  
            p++;
        }
        return szMyStr;
    }
    // ----------------------------------------------------------------------------
    int main() {
        time_t start;
        char *Lowered, Upper[128];
        InitUtoLMap();
        strcpy(Upper, "Every GOOD boy does FINE!");
    
        Lowered = LowerStr(Upper);
        return 0;
    }
    

    This approach will, at the same time, allow you to remap any other characters you wish to change.

    This approach has one huge advantage when running on modern processors, there is no need to do branch prediction as there are no if tests comprising branching. This saves the CPU's branch prediction logic for other loops, and tends to prevent pipeline stalls.

    Some here may recognize this approach as the same one used to convert EBCDIC to ASCII.

    0 讨论(0)
  • 2020-11-22 00:27

    Try this function :)

    string toLowerCase(string str) {
    
        int str_len = str.length();
    
        string final_str = "";
    
        for(int i=0; i<str_len; i++) {
    
            char character = str[i];
    
            if(character>=65 && character<=92) {
    
                final_str += (character+32);
    
            } else {
    
                final_str += character;
    
            }
    
        }
    
        return final_str;
    
    }
    
    0 讨论(0)
  • 2020-11-22 00:30

    This is a follow-up to Stefan Mai's response: if you'd like to place the result of the conversion in another string, you need to pre-allocate its storage space prior to calling std::transform. Since STL stores transformed characters at the destination iterator (incrementing it at each iteration of the loop), the destination string will not be automatically resized, and you risk memory stomping.

    #include <string>
    #include <algorithm>
    #include <iostream>
    
    int main (int argc, char* argv[])
    {
      std::string sourceString = "Abc";
      std::string destinationString;
    
      // Allocate the destination space
      destinationString.resize(sourceString.size());
    
      // Convert the source string to lower case
      // storing the result in destination string
      std::transform(sourceString.begin(),
                     sourceString.end(),
                     destinationString.begin(),
                     ::tolower);
    
      // Output the result of the conversion
      std::cout << sourceString
                << " -> "
                << destinationString
                << std::endl;
    }
    
    0 讨论(0)
  • 2020-11-22 00:31

    Simplest way to convert string into loweercase without bothering about std namespace is as follows

    1:string with/without spaces

    #include <algorithm>
    #include <iostream>
    #include <string>
    using namespace std;
    int main(){
        string str;
        getline(cin,str);
    //------------function to convert string into lowercase---------------
        transform(str.begin(), str.end(), str.begin(), ::tolower);
    //--------------------------------------------------------------------
        cout<<str;
        return 0;
    }
    

    2:string without spaces

    #include <algorithm>
    #include <iostream>
    #include <string>
    using namespace std;
    int main(){
        string str;
        cin>>str;
    //------------function to convert string into lowercase---------------
        transform(str.begin(), str.end(), str.begin(), ::tolower);
    //--------------------------------------------------------------------
        cout<<str;
        return 0;
    }
    
    0 讨论(0)
  • 2020-11-22 00:35

    I wrote this simple helper function:

    #include <locale> // tolower
    
    string to_lower(string s) {        
        for(char &c : s)
            c = tolower(c);
        return s;
    }
    

    Usage:

    string s = "TEST";
    cout << to_lower("HELLO WORLD"); // output: "hello word"
    cout << to_lower(s); // won't change the original variable.
    
    0 讨论(0)
提交回复
热议问题