Parse broken HTML with golang

后端 未结 1 379
北荒
北荒 2021-02-01 10:33

I need to find elements in an HTML string. Unfortunately the HTML is pretty much broken (e.g. closing tags without an opening pair).

I tried to use XPath with launch

1条回答
  •  孤独总比滥情好
    2021-02-01 10:52

    It seems net/html does the job.

    So that's what I am doing now:

    package main
    
    import (
        "strings"
        "golang.org/x/net/html"
        "log"
        "bytes"
        "gopkg.in/xmlpath.v2"
    )
    
    func main() {
        brokenHtml := `

    My First Heading

    paragraph` reader := strings.NewReader(brokenHtml) root, err := html.Parse(reader) if err != nil { log.Fatal(err) } var b bytes.Buffer html.Render(&b, root) fixedHtml := b.String() reader = strings.NewReader(fixedHtml) xmlroot, xmlerr := xmlpath.ParseHTML(reader) if xmlerr != nil { log.Fatal(xmlerr) } var xpath string xpath = `//h1[@id='someid']` path := xmlpath.MustCompile(xpath) if value, ok := path.String(xmlroot); ok { log.Println("Found:", value) } }

    0 讨论(0)
提交回复
热议问题