How to process the XML using XmlLite returned by the casablanca (PPL) http_client?

雨燕双飞 提交于 2019-12-24 13:54:03

问题


I want to make request to the web service, get the XML content, and parse it to get specific values returned by the service.

The code is to be written in native C++11 (MS Visual Studio 2013). The Cassablanca PPL library was chosen. For XML parsing, the XmlLite was chosen.

I am used to C++ programming; however, the async-task programming from the PPL library--the approach--is new to me. I know what the asynchronous programming is, and I know the principles of parallel programming. However, I am not used to using the continuations (.then(...)), and I am only slowly wrapping my head around the concept.

So far, I have modified the examples to get the XML result and write it into the text file:

// Open a stream to the file to write the HTTP response body into.
auto fileBuffer = std::make_shared<concurrency::streams::streambuf<uint8_t>>();
file_buffer<uint8_t>::open(L"test.xml", std::ios::out)
    .then([=](concurrency::streams::streambuf<uint8_t> outFile) -> pplx::task < http_response >
{
    *fileBuffer = outFile;

    // Create an HTTP request.
    // Encode the URI query since it could contain special characters like spaces.
    // Create http_client to send the request.
    http_client client(L"http://api4.mapy.cz/");

    // Build request URI and start the request.
    uri_builder builder(L"/geocode");
    builder.append_query(L"query", address);

    return client.request(methods::GET, builder.to_string());
})

    // Write the response body into the file buffer.
    .then([=](http_response response) -> pplx::task<size_t>
{
    printf("Response status code %u returned.\n", response.status_code());

    return response.body().read_to_end(*fileBuffer);
})

    // Close the file buffer.
    .then([=](size_t)
{
    return fileBuffer->close();
})

    // Wait for the entire response body to be written into the file.
    .wait();

Now, I need to understand how to modify the code to get the result that could be consumed XmlLite (Microsoft implementation that comes as xmllite.h, xmllite.lib, and xmllite.dll. I know what pull parsers are. But again, I am very new to the library. I am still a bit lost in PPL related streams and other classes. I do not know how to use them correctly. Any explanation is higly welcome.

The cassablanca people say they use the XmlLite with the Cassablanca to process the results, but I did not find any example. Can you point me to some? Thanks.

Update (4th June 2014): The above code is actually wrapped as a function like that (wxString comes from wxWidgets, but one can easily replace it by std::string or std::wstring):

std::pair<double, double> getGeoCoordinatesFor(const wxString & address)
{
    ...the above code...
    ...here should be the XML parsing code...
    return {longitude, latitude};
}

The goal actually is instead of writing the stream to the test.xml file to feed the XmlLite parser. The XML is rather small and it contains one or more (if the address is ambiguous) item elements with the x and y attributes that I want to extract -- like this:

<?xml version="1.0" encoding="utf-8"?>
<result>
    <point query="Vítězství 27, Olomouc">
        <item
                x="17.334045"
                y="49.619723"
                id="9025034"
                source="addr"
                title="Vítězství 293/27, Olomouc, okres Olomouc, Česká republika"
        />
        <item
                x="17.333067"
                y="49.61618"
                id="9024797"
                source="addr"
                title="Vítězství 27/1, Olomouc, okres Olomouc, Česká republika"
        />
    </point>
</result>

I do not need that test.xml file. How to get the stream and how to redirect it to the XmlLite parser?


回答1:


I haven't used Casablanca yet, so this may be a little off. (I'd love to work with Casablanca, but I'll have to scrape together more time first.) That said, it looks like the code you show will download an xml file and save it to a local file test.xml. From that point it's straightforward to load the file into XmlLite if the xml file is encoded in UTF-8. If it's not UTF-8, you will have to jump through some additional hoops to decode it, either in memory or via CreateXmlReaderInputWithEncodingName or CreateXmlReaderInputWithCodePage, and I won't cover that here.

Once you've got your UTF-8 file, or you're handling encoding, the easiest approach to starting your XML parse using XmlLite is shown on the documentation for CreateXmlReader:

//Open read-only input stream
if (FAILED(hr = SHCreateStreamOnFile(argv[1], STGM_READ, &pFileStream)))
{
    wprintf(L"Error creating file reader, error is %08.8lx", hr);
    return -1;
}

if (FAILED(hr = CreateXmlReader(__uuidof(IXmlReader), (void**) &pReader, NULL)))
{
    wprintf(L"Error creating xml reader, error is %08.8lx", hr);
    return -1;
}

In your case, you want to skip the file, so you'll need to create an IStream in memory. You have three main options:

  1. Treat your string as a memory buffer and use pMemStream = SHCreateMemStream(szData, cbData)
  2. Stream from Casablanca into an IStream created with CreateStreamOnHGlobal(NULL, true, &pMemStream) and then use that as your source after you finish retrieving it
  3. Create an IStream wrapper for Casablanca's concurrency::streams::istream that hides its asynchronicity behind the IStream interface

Once you have your stream, you have to tell your reader about it with IXmlReader::SetInput.

hr = pReader->SetInput(pStream);

Regardless of the above options, I suggest using RAII classes such as ATL's CComPtr<IStream> and CComPtr<IXMLReader> for the variables they show as pFileStream and pReader, or my suggested pMemStream. This is also when you need to override any properties, say if you have to handle deeper recursion than XmlLite defaults to. Then it's all about pull-reading the file. The simplest loop for that is documented on the IXmlReader::Read method; here are some of the most important pieces, but note that I've omitted error detection for readability:

void Summarize(IXmlReader *pReader, LPCWSTR wszType)
{
    LPCWSTR wszNamespaceURI, wszPrefix, wszLocalName, wszValue;
    UINT cchNamespaceURI, cchPrefix, cchLocalName, cchValue;

    pReader->GetNamespaceURI(&wszNamespaceURI, &cchNamespaceURI);
    pReader->GetPrefix(&wszPrefix, &cchPrefix);
    pReader->GetLocalName(&wszLocalName, &cchLocalName);
    pReader->GetValue(&wszValue, &cchValue);
    std::wcout << wszType << L": ";
    if (cchNamespaceURI) std::wcout << L"{" << wszNamespaceURI << L"} ";
    if (cchPrefix)       std::wcout << wszPrefix << L":";
    std::wcout << wszLocalName << "='" << wszValue << "'\n";
}

void Parse(IXmlReader *pReader)
{
    // Read through each node until the end
    while (!pReader->IsEOF())
    {
        hr = pReader->Read(&nodeType);
        if (hr != S_OK)
            break;

        switch (nodeType)
        {
            //  : : :

            case XmlNodeType_Element:
                Summarize(pReader, L"BeginElement");
                while (S_OK == pReader->MoveToNextAttribute())
                    Summarize(pReader, L"Attribute");
                pReader->MoveToElement();
                if (pReader->IsEmptyElement())
                    std::wcout << L"EndElement\n";
                break;

            case XmlNodeType_EndElement:
                std::wcout << L"EndElement\n";
                break;

            //  : : :
         }
    }
}

Some of the other pieces in that sample code include a check for E_PENDING which can be relevant if the entire file is not yet available. It would likely be "better" to have the Casablanca http_resposne::body feed a custom IStream implementation that XmlLite can begin processing in parallel to its download; this discussion thread covers this idea, but doesn't appear to have a canonical solution. In my experience XmlLite is so fast that the delay it causes is not relevant, so processing it from the complete file may be sufficient, especially if you do require the full file before you can finish your processing.

If you need to better integrate this into an asynchronous system, there will be more hoops. Obviously the while loop above is not asynchronous itself. My guess is that the proper way to make it asynchronous will depend heavily on the content of your file and the processing you have to do while reading it, as well as whether you tie it to a custom IStream that may not have all its data available. Since I don't have any experience with Casabalanca's asynchronicity, I can't comment usefully on this.

Does this address what you're looking for, or was this the part you already knew and you were looking for the IStream wrapper of Casabalanca's http_response::body or tips on making XmlLite's processing asynchronous?



来源:https://stackoverflow.com/questions/23906145/how-to-process-the-xml-using-xmllite-returned-by-the-casablanca-ppl-http-clien

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!