Let\'s say I have a page with a div
. I can easily get that div with soup.find()
.
Now that I have the result, I\'d like to print the WHOLE <
get_text()
If you only want the human-readable text inside a document or tag, you can use the get_text()
method. It returns all the text in a document or beneath a tag, as a single Unicode string:
markup = '\nI linked to example.com\n'
soup = BeautifulSoup(markup, 'html.parser')
soup.get_text()
'\nI linked to example.com\n'
soup.i.get_text()
'example.com'
You can specify a string to be used to join the bits of text together:
soup.get_text("|")
'\nI linked to |example.com|\n'
You can tell Beautiful Soup to strip whitespace from the beginning and end of each bit of text:
soup.get_text("|", strip=True)
'I linked to|example.com'
But at that point you might want to use the .stripped_strings
generator instead, and process the text yourself:
[text for text in soup.stripped_strings]
# ['I linked to', 'example.com']
As of Beautiful Soup version 4.9.0, when lxml
or html.parser
are in use, the contents of ,
, and
tags are not considered to be
‘text’
, since those tags are not part of the human-visible content of the page.
Refer here: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text