Convert
to end line

前端 未结 6 1815
别跟我提以往
别跟我提以往 2020-12-08 19:00

I\'m trying to extract some text using BeautifulSoup. I\'m using get_text() function for this purpose.

My problem is that the text contain

6条回答
  •  旧巷少年郎
    2020-12-08 19:39

    If you call element.text you'll get the text without br tags. Maybe you need define your own custom method for this purpose:

         def clean_text(elem):
            text = ''
            for e in elem.descendants:
                if isinstance(e, str):
                    text += e.strip()
                elif e.name == 'br' or e.name == 'p':
                    text += '\n'
            return text
    
        # get page content
        soup = BeautifulSoup(request_response.text, 'html.parser')
        # get your target element
        description_div = soup.select_one('.description-class')
        # clean the data
        print(clean_text(description_div))
    

提交回复
热议问题