Convert C++ std::string to UTF-16-LE encoded string

强颜欢笑 提交于 2019-11-27 07:37:50

问题


I've been searching for hours today and just can't find anything that works out for me. The one I've just had a look at, with no luck, is "How to convert UTF-8 encoded std::string to UTF-16 std::string".

My question is, with a brief explanation:

I want to make a valid NTLM hash in std C++, and I'm using OpenSSL's library to create the hash using its MD4 routines. I know how to do that, so does anyone know how to convert the std::string into a UTF-16 LE encoded string which I can pass to the MD4 functions to get a correct digest?

So, can I have a std::string which holds the char type, and convert it to a UTF16-LE encoded variable length std::string_type? Whether that be std::u16string, or std::wstring?

And would I use s.c_str() or s.data() and would the length() function report correctly in both cases?


回答1:


Apologies, firsthand... this will be an ugly reply with some long code. I ended up using the following function, while effectively compiling in iconv into my windows application file by file :)

Hope this helps.

char* conver(const char* in, size_t in_len, size_t* used_len)
{
    const int CC_MUL = 2; // 16 bit
    setlocale(LC_ALL, "");
    char* t1 = setlocale(LC_CTYPE, "");
    char* locn = (char*)calloc(strlen(t1) + 1, sizeof(char));
    if(locn == NULL)
    {
        return 0;
    }

    strcpy(locn, t1);
    const char* enc = strchr(locn, '.') + 1;

#if _WINDOWS
    std::string win = "WINDOWS-";
    win += enc;
    enc = win.c_str();
#endif

    iconv_t foo = iconv_open("UTF-16LE", enc);

    if(foo == (void*)-1)
    {
        if (errno == EINVAL)
        {
            fprintf(stderr, "Conversion from %s is not supported\n", enc);
        }
        else
        {
            fprintf(stderr, "Initialization failure:\n");
        }
        free(locn);
        return 0;
    }

    size_t out_len = CC_MUL * in_len;
    size_t saved_in_len = in_len;
    iconv(foo, NULL, NULL, NULL, NULL);
    char* converted = (char*)calloc(out_len, sizeof(char));
    char *converted_start = converted;
    char* t = const_cast<char*>(in);
    int ret = iconv(foo,
                    &t,
                    &in_len,
                    &converted,
                    &out_len);
    iconv_close(foo);
    *used_len = CC_MUL * saved_in_len - out_len;

    if(ret == -1)
    {
        switch(errno)
        {
        case EILSEQ:
            fprintf(stderr,  "EILSEQ\n");
            break;
        case EINVAL:
            fprintf(stderr,  "EINVAL\n");
            break;
        }

        perror("iconv");
        free(locn);
        return 0;
    }
    else
    {
        free(locn);
        return converted_start;
    }
}



回答2:


I think something like this should do the trick:

std::string utf16_to_utf8(std::u16string const& s)
{
    std::wstring_convert<std::codecvt_utf8_utf16<char16_t, 0x10ffff,
        std::codecvt_mode::little_endian>, char16_t> cnv;
    std::string utf8 = cnv.to_bytes(s);
    if(cnv.converted() < s.size())
        throw std::runtime_error("incomplete conversion");
    return utf8;
}

std::u16string utf8_to_utf16(std::string const& utf8)
{
    std::wstring_convert<std::codecvt_utf8_utf16<char16_t, 0x10ffff,
        std::codecvt_mode::little_endian>, char16_t> cnv;
    std::u16string s = cnv.from_bytes(utf8);
    if(cnv.converted() < utf8.size())
        throw std::runtime_error("incomplete conversion");
    return s;
}

Note: that std::wstring_convert is deprecated in C++17 but I still favor using it rather than a non-standard library given that it is portable, has no dependencies and will no doubt remain until replaced.

And, if all else fails, you can reimplement these same functions with alternative code without changing any other part of the application.



来源:https://stackoverflow.com/questions/52703630/convert-c-stdstring-to-utf-16-le-encoded-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!