BeautifulSoup innerhtml?

前端 未结 6 1508
猫巷女王i
猫巷女王i 2020-11-27 13:28

Let\'s say I have a page with a div. I can easily get that div with soup.find().

Now that I have the result, I\'d like to print the WHOLE <

6条回答
  •  攒了一身酷
    2020-11-27 14:20

    For just text, Beautiful Soup 4 get_text()

    If you only want the human-readable text inside a document or tag, you can use the get_text() method. It returns all the text in a document or beneath a tag, as a single Unicode string:

    markup = '\nI linked to example.com\n'
    soup = BeautifulSoup(markup, 'html.parser')
    
    soup.get_text()
    '\nI linked to example.com\n'
    soup.i.get_text()
    'example.com' 
    

    You can specify a string to be used to join the bits of text together:

    soup.get_text("|")
    '\nI linked to |example.com|\n' 
    

    You can tell Beautiful Soup to strip whitespace from the beginning and end of each bit of text:

    soup.get_text("|", strip=True)
    'I linked to|example.com' 
    

    But at that point you might want to use the .stripped_strings generator instead, and process the text yourself:

    [text for text in soup.stripped_strings]
    # ['I linked to', 'example.com'] 
    

    As of Beautiful Soup version 4.9.0, when lxml or html.parser are in use, the contents of

提交回复
热议问题