Problems converting BSTR to char *

倾然丶 夕夏残阳落幕 提交于 2019-12-13 01:11:47

问题


We've an old C++ app that's making calls to third-party webservices, using WinHttp.WinHttpRequest.5.1.

I won't list all of the details of the call sequence, as I don't think it's relevant to the problem, but we finish by calling hr = pIWinHttpRequest->get_ResponseText(&bstrResponse);, where bstrResponse is of type BSTR.

The calling code doesn't work with BSTRs, it works with standard C/C++ char *'s, so the code converts the BSTR to a char * with:

_bstr_t b(bstrResponse);
const char *c = static_cast<char *>(b);

And for all of the prior webservices we've accessed with this code, this has worked. But for this new one, it's not.

The data we're getting back is supposed to be XML, but for this one webservice, it looks like we're getting some character code conversion problems. Our resulting string starts with; "?&lt;?xml version="1.0" encoding="utf-8"?&gt;..."

Notice the extra ? at the beginning. When walking through this in the debugger, we don't see this in displayed value of bstrResponse, and we don't see it in the displayed value of b, but we do see it in the displayed value of c.

Any ideas as to what might be going on?

EDITED

I understand that BSTR is a multi-byte type, but all of the characters in this string are plain ASCII, and none of the code that calls this function can handle multi-byte characters. Browsing around the web, I see this specific mechanism recommended frequently, but in this case, it doesn't work.

I need to convert this string from BSTR to an array of single-byte characters. Even if that means stripping out multi-byte characters that cannot be converted.


回答1:


The conversion in your code using static_cast on a _bstr_t converts to ANSI correctly. The appearance of ? in an encoding conversion indicates that the conversion of a character failed. The most likely reason for this is that bstrResponse contains characters that are not present in your ANSI codepage. I would expect that you should be converting to UTF-8 rather than ANSI, but of course I don't have all the information that you have.

The bottom line is that the ? indicates that the source string contains a character that cannot be encoded in the destination character set.

Update

Your answer gives further evidence that you should be converting to UTF-8. Only you can know for sure, but the evidence you present is consistent with that conclusion.




回答2:


Turns out there were two problems. First that the conversion process described above does not strip out the byte-order-mark, which in my mind it should, and the second that the old C++ XML parser we are using chokes on 8-bit ASCII chars, and this webservice is sending us a copyright symbol in their text, ASCII '\xA9'.

With the BOM stripped and high-bit characters replaced by spaces, the parser works fine.



来源:https://stackoverflow.com/questions/14443193/problems-converting-bstr-to-char

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!