HTML to List using XMLWorker

随声附和 提交于 2019-12-17 06:55:11

问题


Could somebody please provide an example of parsing HTML into a list of elements using XMLWorkerHelper in iTextSharp (C#).

The JAVA version as given in the documentation is:

XMLWorkerHelper.getInstance().parseXHtml(new ElementHandler() {
        public void add(final Writable w) {

          if (w instanceof WritableElement) {
            List<Element> elements = ((WritableElement)w).elements();
          // write class names of elements to file
         }
        }

     }, HTMLParsingToList.class.getResourceAsStream("/html/walden.html"));

回答1:


You need to implement the IElementHandler interface in a class of your own:

public class SampleHandler : IElementHandler {
    //Generic list of elements
    public List<IElement> elements = new List<IElement>();
    //Add the supplied item to the list
    public void Add(IWritable w) {
        if (w is WritableElement) {
            elements.AddRange(((WritableElement)w).Elements());
        }
    }
}

Instead of using the file stream here's an example parsing a string. To use a file replace the StringReader with a StreamReader.

    string html = "<html><head><title>Test Document</title></head><body><p>This is a test. <strong>Bold <em>and italic</em></strong></p><ol><li>Dog</li><li>Cat</li></ol></body></html>";
    //Instantiate our handler
    var mh = new SampleHandler();
    //Bind a reader to our text
    using (TextReader sr = new StringReader(html)) {
        //Parse
        XMLWorkerHelper.GetInstance().ParseXHtml(mh, sr);
    }

    //Loop through each element
    foreach (var element in mh.elements) {
        //Loop through each chunk in each element
        foreach (var chunk in element.Chunks) {
            //Do something
        }
    }


来源:https://stackoverflow.com/questions/15354005/html-to-list-using-xmlworker

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!