How can libxml2 be used to parse data from XML?

问题

I have looked around at the libxml2 code samples and I am confused on how to piece them all together.

What are the steps needed when using libxml2 to just parse or extract data from an XML file?

I would like to get hold of, and possibly store information for, certain attributes. How is this done?

回答1:

I believe you first need to create a Parse tree. Maybe this article can help, look through the section which says How to Parse a Tree with Libxml2.

回答2:

I found these two resources helpful when I was learning to use libxml2 to build a rss feed parser.

Tutorial with SAX interface

Tutorial using the DOM Tree (code example for getting an attribute value included)

回答3:

libxml2 provides various examples showing basic usage.

http://xmlsoft.org/examples/index.html

For your stated goals, tree1.c would probably be most relevant.

tree1.c: Navigates a tree to print element names

Parse a file to a tree, use xmlDocGetRootElement() to get the root element, then walk the document and print all the element name in document order.

http://xmlsoft.org/examples/tree1.c

Once you have an xmlNode struct for an element, the "properties" member is a linked list of attributes. Each xmlAttr object has a "name" and "children" object (which are the name/value for that attribute, respectively), and a "next" member which points to the next attribute (or null for the last one).

http://xmlsoft.org/html/libxml-tree.html#xmlNode

http://xmlsoft.org/html/libxml-tree.html#xmlAttr

回答4:

Here, I mentioned complete process to extract XML/HTML data from file on windows platform.

First download pre-compiled .dll form http://xmlsoft.org/sources/win32/
Also download its dependency iconv.dll and zlib1.dll from the same page
Extract all .zip files into the same directory. For Ex: D:\demo\
Copy iconv.dll, zlib1.dll and libxml2.dll into c:\windows\system32 deirectory

Make libxml_test.cpp file and copy following code into that file.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/HTMLparser.h>

void traverse_dom_trees(xmlNode * a_node)
{
    xmlNode *cur_node = NULL;

    if(NULL == a_node)
    {
        //printf("Invalid argument a_node %p\n", a_node);
        return;
    }

    for (cur_node = a_node; cur_node; cur_node = cur_node->next) 
    {
        if (cur_node->type == XML_ELEMENT_NODE) 
        {
            /* Check for if current node should be exclude or not */
            printf("Node type: Text, name: %s\n", cur_node->name);
        }
        else if(cur_node->type == XML_TEXT_NODE)
        {
            /* Process here text node, It is available in cpStr :TODO: */
            printf("node type: Text, node content: %s,  content length %d\n", (char *)cur_node->content, strlen((char *)cur_node->content));
        }
        traverse_dom_trees(cur_node->children);
    }
}

int main(int argc, char **argv) 
{
    htmlDocPtr doc;
    xmlNode *roo_element = NULL;

    if (argc != 2)  
    {
        printf("\nInvalid argument\n");
        return(1);
    }

    /* Macro to check API for match with the DLL we are using */
    LIBXML_TEST_VERSION    

    doc = htmlReadFile(argv[1], NULL, HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET);
    if (doc == NULL) 
    {
        fprintf(stderr, "Document not parsed successfully.\n");
        return 0;
    }

    roo_element = xmlDocGetRootElement(doc);

    if (roo_element == NULL) 
    {
        fprintf(stderr, "empty document\n");
        xmlFreeDoc(doc);
        return 0;
    }

    printf("Root Node is %s\n", roo_element->name);
    traverse_dom_trees(roo_element);

    xmlFreeDoc(doc);       // free document
    xmlCleanupParser();    // Free globals
    return 0;
}

Open Visual Studio Command Promt
Go To D:\demo directory
execute cl libxml_test.cpp /I".\libxml2-2.7.8.win32\include" /I".\iconv-1.9.2.win32\include" /link libxml2-2.7.8.win32\lib\libxml2.lib command
Run binary using libxml_test.exe test.html command(Here test.html may be any valid HTML file)

来源：https://stackoverflow.com/questions/5465965/how-can-libxml2-be-used-to-parse-data-from-xml

标签

xml

parsing

libxml2