How can libxml2 be used to parse data from XML?

匿名 (未验证) 提交于 2019-12-03 01:29:01

问题:

I have looked around at the libxml2 code samples and I am confused on how to piece them all together.

What are the steps needed when using libxml2 to just parse or extract data from an XML file?

I would like to get hold of, and possibly store information for, certain attributes. How is this done?

回答1:

I believe you first need to create a Parse tree. Maybe this article can help, look through the section which says How to Parse a Tree with Libxml2.



回答2:

I found these two resources helpful when I was learning to use libxml2 to build a rss feed parser.

Tutorial with SAX interface

Tutorial using the DOM Tree (code example for getting an attribute value included)



回答3:

libxml2 provides various examples showing basic usage.

http://xmlsoft.org/examples/index.html

For your stated goals, tree1.c would probably be most relevant.

tree1.c: Navigates a tree to print element names

Parse a file to a tree, use xmlDocGetRootElement() to get the root element, then walk the document and print all the element name in document order.

http://xmlsoft.org/examples/tree1.c

Once you have an xmlNode struct for an element, the "properties" member is a linked list of attributes. Each xmlAttr object has a "name" and "children" object (which are the name/value for that attribute, respectively), and a "next" member which points to the next attribute (or null for the last one).

http://xmlsoft.org/html/libxml-tree.html#xmlNode

http://xmlsoft.org/html/libxml-tree.html#xmlAttr



回答4:

Here, I mentioned complete process to extract XML/HTML data from file on windows platform.

  1. First download pre-compiled .dll form http://xmlsoft.org/sources/win32/
  2. Also download its dependency iconv.dll and zlib1.dll from the same page

  3. Extract all .zip files into the same directory. For Ex: D:\demo\

  4. Copy iconv.dll, zlib1.dll and libxml2.dll into c:\windows\system32 deirectory

  5. Make libxml_test.cpp file and copy following code into that file.

    #include  #include  #include  #include   void traverse_dom_trees(xmlNode * a_node) {     xmlNode *cur_node = NULL;      if(NULL == a_node)     {         //printf("Invalid argument a_node %p\n", a_node);         return;     }      for (cur_node = a_node; cur_node; cur_node = cur_node->next)      {         if (cur_node->type == XML_ELEMENT_NODE)          {             /* Check for if current node should be exclude or not */             printf("Node type: Text, name: %s\n", cur_node->name);         }         else if(cur_node->type == XML_TEXT_NODE)         {             /* Process here text node, It is available in cpStr :TODO: */             printf("node type: Text, node content: %s,  content length %d\n", (char *)cur_node->content, strlen((char *)cur_node->content));         }         traverse_dom_trees(cur_node->children);     } }  int main(int argc, char **argv)  {     htmlDocPtr doc;     xmlNode *roo_element = NULL;      if (argc != 2)       {         printf("\nInvalid argument\n");         return(1);     }      /* Macro to check API for match with the DLL we are using */     LIBXML_TEST_VERSION          doc = htmlReadFile(argv[1], NULL, HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET);     if (doc == NULL)      {         fprintf(stderr, "Document not parsed successfully.\n");         return 0;     }      roo_element = xmlDocGetRootElement(doc);      if (roo_element == NULL)      {         fprintf(stderr, "empty document\n");         xmlFreeDoc(doc);         return 0;     }      printf("Root Node is %s\n", roo_element->name);     traverse_dom_trees(roo_element);      xmlFreeDoc(doc);       // free document     xmlCleanupParser();    // Free globals     return 0; } 
  6. Open Visual Studio Command Promt

  7. Go To D:\demo directory

  8. execute cl libxml_test.cpp /I".\libxml2-2.7.8.win32\include" /I".\iconv-1.9.2.win32\include" /link libxml2-2.7.8.win32\lib\libxml2.lib command

  9. Run binary using libxml_test.exe test.html command(Here test.html may be any valid HTML file)



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!