How do you parse an HTML in vb.net

醉酒当歌 提交于 2019-11-26 02:15:00

问题


I would like to know if there is a simple way to parse HTML in vb.net. I know that HTML is not sctrict subset of XML, but it would be nice if it could be treated that way. Is there anything out there that would let me parse HTML in an XML-like way in VB.net?


回答1:


I like Html Agility pack - it's very developer friendly, free and source code is available.




回答2:


'add prog ref too: Microsoft.mshtml

'then on the page:

Imports mshtml

Function parseMyHtml(ByVal htmlToParse$) As String
    Dim htmlDocument As IHTMLDocument2 = New HTMLDocumentClass()
    htmlDocument.write(htmlToParse)
    htmlDocument.close()

    Dim allElements As IHTMLElementCollection = htmlDocument.body.all

    Dim allInputs As IHTMLElementCollection = allElements.tags("a")
    Dim element As IHTMLElement
    For Each element In allInputs
        element.title = element.innerText
    Next

    Return htmlDocument.body.innerHTML
End Function

As found here:




回答3:


If your HTML follows XHTML standards, you can do a lot of the parsing and processing using the System.XML namespace classes.

If, on the other hand, if what you're parsing is what web developers refer to as "tag soup," you'll need a third-party parser like HTML Agility Pack.

This may be only a partial solution to your problem if you're trying to figure out how a browser will interpret your HTML as each browser parses tag soup slightly differently.




回答4:


Don't use agility pack, just use mshtml library to access the dom, this is what ie uses and is great for going through HTML elements.

Agility pack is nasty and unnecessarily hackie if you ask me, mshtml is the way to go. Look it up on msdn.




回答5:


Is it well formed? If the HTML is in fact well formed then it can be parsed as XML. If it is tag soup and there are unclosed elements and such I would think you would have to hunt around for a third-party solution.



来源:https://stackoverflow.com/questions/516811/how-do-you-parse-an-html-in-vb-net

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!