Convert wchar_t to char

后端 未结 9 1584
谎友^
谎友^ 2020-12-13 04:19

I was wondering is it safe to do so?

wchar_t wide = /* something */;
assert(wide >= 0 && wide < 256 &&);
char myChar = static_cast

        
相关标签:
9条回答
  • 2020-12-13 04:22

    A short function I wrote a while back to pack a wchar_t array into a char array. Characters that aren't on the ANSI code page (0-127) are replaced by '?' characters, and it handles surrogate pairs correctly.

    size_t to_narrow(const wchar_t * src, char * dest, size_t dest_len){
      size_t i;
      wchar_t code;
    
      i = 0;
    
      while (src[i] != '\0' && i < (dest_len - 1)){
        code = src[i];
        if (code < 128)
          dest[i] = char(code);
        else{
          dest[i] = '?';
          if (code >= 0xD800 && code <= 0xD8FF)
            // lead surrogate, skip the next code unit, which is the trail
            i++;
        }
        i++;
      }
    
      dest[i] = '\0';
    
      return i - 1;
    
    }
    
    0 讨论(0)
  • 2020-12-13 04:27

    one could also convert wchar_t --> wstring --> string --> char

    wchar_t wide;
    wstring wstrValue;
    wstrValue[0] = wide
    
    string strValue;
    strValue.assign(wstrValue.begin(), wstrValue.end());  // convert wstring to string
    
    char char_value = strValue[0];
    
    0 讨论(0)
  • 2020-12-13 04:30

    An easy way is :

            wstring your_wchar_in_ws(<your wchar>);
            string your_wchar_in_str(your_wchar_in_ws.begin(), your_wchar_in_ws.end());
            char* your_wchar_in_char =  your_wchar_in_str.c_str();
    

    I'm using this method for years :)

    0 讨论(0)
  • 2020-12-13 04:32

    assert is for ensuring that something is true in a debug mode, without it having any effect in a release build. Better to use an if statement and have an alternate plan for characters that are outside the range, unless the only way to get characters outside the range is through a program bug.

    Also, depending on your character encoding, you might find a difference between the Unicode characters 0x80 through 0xff and their char version.

    0 讨论(0)
  • 2020-12-13 04:32

    You are looking for wctomb(): it's in the ANSI standard, so you can count on it. It works even when the wchar_t uses a code above 255. You almost certainly do not want to use it.


    wchar_t is an integral type, so your compiler won't complain if you actually do:

    char x = (char)wc;
    

    but because it's an integral type, there's absolutely no reason to do this. If you accidentally read Herbert Schildt's C: The Complete Reference, or any C book based on it, then you're completely and grossly misinformed. Characters should be of type int or better. That means you should be writing this:

    int x = getchar();
    

    and not this:

    char x = getchar(); /* <- WRONG! */
    

    As far as integral types go, char is worthless. You shouldn't make functions that take parameters of type char, and you should not create temporary variables of type char, and the same advice goes for wchar_t as well.

    char* may be a convenient typedef for a character string, but it is a novice mistake to think of this as an "array of characters" or a "pointer to an array of characters" - despite what the cdecl tool says. Treating it as an actual array of characters with nonsense like this:

    for(int i = 0; s[i]; ++i) {
      wchar_t wc = s[i];
      char c = doit(wc);
      out[i] = c;
    }
    

    is absurdly wrong. It will not do what you want; it will break in subtle and serious ways, behave differently on different platforms, and you will most certainly confuse the hell out of your users. If you see this, you are trying to reimplement wctombs() which is part of ANSI C already, but it's still wrong.

    You're really looking for iconv(), which converts a character string from one encoding (even if it's packed into a wchar_t array), into a character string of another encoding.

    Now go read this, to learn what's wrong with iconv.

    0 讨论(0)
  • 2020-12-13 04:39

    In general, no. int(wchar_t(255)) == int(char(255)) of course, but that just means they have the same int value. They may not represent the same characters.

    You would see such a discrepancy in the majority of Windows PCs, even. For instance, on Windows Code page 1250, char(0xFF) is the same character as wchar_t(0x02D9) (dot above), not wchar_t(0x00FF) (small y with diaeresis).

    Note that it does not even hold for the ASCII range, as C++ doesn't even require ASCII. On IBM systems in particular you may see that 'A' != 65

    0 讨论(0)
提交回复
热议问题