Converting html to text with Python

后端 未结 9 837
一生所求
一生所求 2020-12-12 17:49

I am trying to convert an html block to text using Python.

Input:

9条回答
  •  青春惊慌失措
    2020-12-12 18:32

    You can use a regular expression, but it's not recommended. The following code removes all the HTML tags in your data, giving you the text:

    import re
    
    data = """

    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa

    Consectetuer adipiscing elit. Some Link Aenean commodo ligula eget dolor. Aenean massa

    Aenean massa.Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa

    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa

    Consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa

    """ data = re.sub(r'<.*?>', '', data) print(data)

    Output

    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
    Consectetuer adipiscing elit. Some Link Aenean commodo ligula eget dolor. Aenean massa
    Aenean massa.Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
    Consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
    

提交回复
热议问题