Converting html to text with Python

后端 未结 9 838
一生所求
一生所求 2020-12-12 17:49

I am trying to convert an html block to text using Python.

Input:

9条回答
  •  醉酒成梦
    2020-12-12 18:32

    gazpacho might be a good choice for this!

    Input:

    from gazpacho import Soup
    
    html = """\
    

    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa

    Consectetuer adipiscing elit. Some Link Aenean commodo ligula eget dolor. Aenean massa

    Aenean massa.Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa

    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa

    Consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa

    """

    Output:

    text = Soup(html).strip(whitespace=False) # to keep "\n" characters intact
    print(text)
    
    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
    Consectetuer adipiscing elit. Some Link Aenean commodo ligula eget dolor. Aenean massa
    Aenean massa.Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
    Consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
    

提交回复
热议问题