Converting html to text with Python

后端 未结 9 842
一生所求
一生所求 2020-12-12 17:49

I am trying to convert an html block to text using Python.

Input:

9条回答
  •  刺人心
    刺人心 (楼主)
    2020-12-12 18:51

    soup.get_text() outputs what you want:

    from bs4 import BeautifulSoup
    soup = BeautifulSoup(html)
    print(soup.get_text())
    

    output:

    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
    Consectetuer adipiscing elit. Some Link Aenean commodo ligula eget dolor. Aenean massa
    Aenean massa.Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
    Consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
    

    To keep newlines:

    print(soup.get_text('\n'))
    

    To be identical to your example, you can replace a newline with two newlines:

    soup.get_text().replace('\n','\n\n')
    

提交回复
热议问题