Possible to parse a HTML document and build a DOM tree(java)

后端 未结 5 711
孤街浪徒
孤街浪徒 2021-01-07 07:54

Is it possible and what tools could be used to parse an html document as a string or from a file and then to construct a DOM tree so that a developer can walk the tree throu

5条回答
  •  Happy的楠姐
    2021-01-07 08:14

    JTidy should let you do what you want.

    Usage is fairly straight forward, but parsing is configurable. e.g.:

    InputStream in = ...;
    Tidy tidy = new Tidy();
    // configure Tidy instance as required
    ...
    ...
    Document doc = tidy.parseDOM(in, null);
    Element root = doc.getDocumentElement();
    

    The JavaDoc is hosted here.

提交回复
热议问题