How do I fix wrongly nested / unclosed HTML tags?

前端 未结 5 1247
悲&欢浪女
悲&欢浪女 2020-12-01 09:56

I need to sanitize HTML submitted by the user by closing any open tags with correct nesting order. I have been looking for an algorithm or Python code to do this but haven\'

5条回答
  •  情话喂你
    2020-12-01 10:16

    using BeautifulSoup:

    from BeautifulSoup import BeautifulSoup
    html = "

    • Foo" soup = BeautifulSoup(html) print soup.prettify()

    gets you

    • Foo

    As far as I know, you can't control putting the

  • tags on separate lines from Foo.

    using Tidy:

    import tidy
    html = "

    • Foo" print tidy.parseString(html, show_body_only=True)

    gets you

    • Foo

    Unfortunately, I know of no way to keep the

    tag in the example. Tidy interprets it as an empty paragraph rather than an unclosed one, so doing

    print tidy.parseString(html, show_body_only=True, drop_empty_paras=False)
    

    comes out as

    • Foo

    Ultimately, of course, the

    tag in your example is redundant, so you might be fine with losing it.

    Finally, Tidy can also do indenting:

    print tidy.parseString(html, show_body_only=True, indent=True)
    

    becomes

    • Foo

    All of these have their ups and downs, but hopefully one of them is close enough.

提交回复
热议问题