问题
I have looked around at the libxml2 code samples and I am confused on how to piece them all together.
What are the steps needed when using libxml2 to just parse or extract data from an XML file?
I would like to get hold of, and possibly store information for, certain attributes. How is this done?
回答1:
I believe you first need to create a Parse tree. Maybe this article can help, look through the section which says How to Parse a Tree with Libxml2.
回答2:
I found these two resources helpful when I was learning to use libxml2 to build a rss feed parser.
Tutorial with SAX interface
Tutorial using the DOM Tree (code example for getting an attribute value included)
回答3:
libxml2 provides various examples showing basic usage.
http://xmlsoft.org/examples/index.html
For your stated goals, tree1.c would probably be most relevant.
tree1.c: Navigates a tree to print element names
Parse a file to a tree, use xmlDocGetRootElement() to get the root element, then walk the document and print all the element name in document order.
http://xmlsoft.org/examples/tree1.c
Once you have an xmlNode struct for an element, the "properties" member is a linked list of attributes. Each xmlAttr object has a "name" and "children" object (which are the name/value for that attribute, respectively), and a "next" member which points to the next attribute (or null for the last one).
http://xmlsoft.org/html/libxml-tree.html#xmlNode
http://xmlsoft.org/html/libxml-tree.html#xmlAttr
回答4:
Here, I mentioned complete process to extract XML/HTML data from file on windows platform.
- First download pre-compiled .dll form http://xmlsoft.org/sources/win32/
Also download its dependency iconv.dll and zlib1.dll from the same page
Extract all .zip files into the same directory. For Ex: D:\demo\
Copy iconv.dll, zlib1.dll and libxml2.dll into c:\windows\system32 deirectory
Make libxml_test.cpp file and copy following code into that file.
#include <stdio.h> #include <string.h> #include <stdlib.h> #include <libxml/HTMLparser.h> void traverse_dom_trees(xmlNode * a_node) { xmlNode *cur_node = NULL; if(NULL == a_node) { //printf("Invalid argument a_node %p\n", a_node); return; } for (cur_node = a_node; cur_node; cur_node = cur_node->next) { if (cur_node->type == XML_ELEMENT_NODE) { /* Check for if current node should be exclude or not */ printf("Node type: Text, name: %s\n", cur_node->name); } else if(cur_node->type == XML_TEXT_NODE) { /* Process here text node, It is available in cpStr :TODO: */ printf("node type: Text, node content: %s, content length %d\n", (char *)cur_node->content, strlen((char *)cur_node->content)); } traverse_dom_trees(cur_node->children); } } int main(int argc, char **argv) { htmlDocPtr doc; xmlNode *roo_element = NULL; if (argc != 2) { printf("\nInvalid argument\n"); return(1); } /* Macro to check API for match with the DLL we are using */ LIBXML_TEST_VERSION doc = htmlReadFile(argv[1], NULL, HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET); if (doc == NULL) { fprintf(stderr, "Document not parsed successfully.\n"); return 0; } roo_element = xmlDocGetRootElement(doc); if (roo_element == NULL) { fprintf(stderr, "empty document\n"); xmlFreeDoc(doc); return 0; } printf("Root Node is %s\n", roo_element->name); traverse_dom_trees(roo_element); xmlFreeDoc(doc); // free document xmlCleanupParser(); // Free globals return 0; }
Open Visual Studio Command Promt
Go To D:\demo directory
execute cl libxml_test.cpp /I".\libxml2-2.7.8.win32\include" /I".\iconv-1.9.2.win32\include" /link libxml2-2.7.8.win32\lib\libxml2.lib command
Run binary using libxml_test.exe test.html command(Here test.html may be any valid HTML file)
来源:https://stackoverflow.com/questions/5465965/how-can-libxml2-be-used-to-parse-data-from-xml