UTF-8, CString and CFile? (C++, MFC)

前端未结

关注

 3  728

I\'m currently working on a MFC program that specifically has to work with UTF-8. At some point, I have to write UTF-8 data into a file; to do that, I\'m using CFiles and CS

相关标签:

3条回答

执笔经年

2020-12-07 22:29

You'll have to convert sWorkingLine to UTF-8 and then write it in the file.

WideCharToMultiByte can convert unicode strings to UTF-8 if you select the CP_UTF8 codepage. MultiByteToWideChar can convert ASCII chars to unicode.

0 讨论(0)
发布评论:

提交评论
- 加载中...
我寻月下人不归

2020-12-07 22:34
When you output data you need to do (this assumes you are compiling in Unicode mode, which is highly recommended):
```
CString russianText = L"Привет мир";

CFile yourFile(_T("yourfile.txt"), CFile::modeWrite | CFile::modeCreate);

CT2CA outputString(russianText, CP_UTF8);
yourFile.Write(outputString, ::strlen(outputString));
```
If _UNICODE is not defined (you are working in multi-byte mode instead), you need to know what code page your input text is in and convert it to something you can use. This example shows working with Russian text that is in UTF-16 format, saving it to UTF-8:
```
// Example 1: convert from Russian text in UTF-16 (note the "L"
// in front of the string), into UTF-8.
CW2A russianTextAsUtf8(L"Привет мир", CP_UTF8);
yourFile.Write(russianTextAsUtf8, ::strlen(russianTextAsUtf8));
```
More likely, your Russian text is in some other code page, such as KOI-8R. In that case, you need to convert from the other code page into UTF-16. Then convert the UTF-16 into UTF-8. You cannot convert directly from KOI-8R to UTF-8 using the conversion macros because they always try to convert narrow text to the system code page. So the easy way is to do this:
```
// Example 2: convert from Russian text in KOI-8R (code page 20866)
// to UTF-16, and then to UTF-8. Conversions between UTFs are
// lossless.
CA2W russianTextAsUtf16("\xf0\xd2\xc9\xd7\xc5\xd4 \xcd\xc9\xd2", 20866);
CW2A russianTextAsUtf8(russianTextAsUtf16, CP_UTF8);
yourFile.Write(russianTextAsUtf8, ::strlen(russianTextAsUtf8));
```
You don't need a BOM (it's optional; I wouldn't use it unless there was a specific reason to do so).

Make sure you read this: http://msdn.microsoft.com/en-us/library/87zae4a3(VS.80).aspx. If you incorrectly use CT2CA (for example, using the assignment operator) you will run into trouble. The linked documentation page shows examples of how to use and how not to use it.

Further information:
- The C in CT2CA indicates const. I use it when possible, but some conversions only support the non-const version (e.g. CW2A).
- The T in CT2CA indicates that you are converting from an LPCTSTR. Thus it will work whether your code is compiled with the _UNICODE flag or not. You could also use CW2A (where W indicates wide characters).
- The A in CT2CA indicates that you are converting to an "ANSI" (8-bit char) string.
- Finally, the second parameter to CT2CA indicates the code page you are converting to.
To do the reverse conversion (from UTF-8 to LPCTSTR), you could do:
```
CString myString(CA2CT(russianText, CP_UTF8));
```
In this case, we are converting from an "ANSI" string in UTF-8 format, to an LPCTSTR. The LPCTSTR is always assumed to be UTF-16 (if _UNICODE is defined) or the current system code page (if _UNICODE is not defined).
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦毁少年i

2020-12-07 22:46

Make sure you're using Unicode (TCHAR is wchar_t). Then before you write the data, convert it using the WideCharToMultiByte Win32 API function.

0 讨论(0)
发布评论:

提交评论
- 加载中...