WinHTTP Request data in unicode?

问题

I'm trying to read a web page via WinHTTP:

bool WinHTTPClass::QueryResponseData(std::string &query_data)
{
    // Read response

    DWORD dwSize, dwDownloaded = 0;

    do 
    {
        // Check for available data.  

        if( !WinHttpQueryDataAvailable( hRequest, &dwSize ) )
        {
            cout << "Error querying data : " << GetLastError() << endl;
            return false;
        }

        // Allocate space for the buffer.

        char* pszOutBuffer = new char[dwSize+1];

        if( !pszOutBuffer )
        {
            cout << "Out of memory" << endl;
            dwSize=0;
        }
        else
        {
            // Read the data.
            ZeroMemory( pszOutBuffer, dwSize+1 );

            if( !WinHttpReadData( hRequest, (LPVOID)pszOutBuffer, 
                                dwSize, &dwDownloaded ) )
            {
                cout << "Error reading data : " << GetLastError() << endl;
                return false;
            }
            else
            {
                query_data += pszOutBuffer;
            }

            // Free the memory allocated to the buffer.
            delete [] pszOutBuffer;
        }
    }
    while( dwSize > 0 );

    return true;
}

All this works well. The confusion I am having here is that should I handle the buffer data using unicode encoding buffer instead of:

char* pszOutBuffer = new char[dwSize+1];

By such as using wchar_t instead the web pages commonly use UTF8? What's the difference? Where am I confused?

回答1:

HTTP is a binary transport, it has no concept of text or Unicode. HTTP uses 7bit ASCII for HTTP headers, but content is arbitrary binary data whose interpretation is dependent on the HTTP headers that describe it, most notably the Content-Type header. So you need to receive the raw content data into your char[] buffer first, then look at the received Content-Type header using WinHttpQueryHeaders() to see what kind of data you received. If it says you received a text/... type then the header will usually also specify the charset of the text. In the case of text/html, the charset may be in a <meta> tag within the HTML itself instead of in the HTTP header. Once you know the charset of the text, you can then convert it to wchar_t[] using MultiByteToWideChar() (you will have to manually lookup the appropriate codepage for the charset).

来源：https://stackoverflow.com/questions/21503314/winhttp-request-data-in-unicode

标签

c++

html

unicode

winhttp