I\'m trying to extract the text from inside a tag with a
inside on www.uszip.com:
Here is an example of what I\'m t
Unfortunately, you cannot match tags with both text and nested tags, based on the contained text alone.
You'd have to loop over all <dt>
without text:
for dt in soup.find_all('dt', text=False):
if 'Land area' in dt.text:
print dt.contents[0]
This sounds counter-intuitive, but the .string
attribute for such tags is empty, and that is what BeautifulSoup is matching against. .text
contains all strings in all nested tags combined, and that is not matched against.
You could also use a custom function to do the search:
soup.find_all(lambda t: t.name == 'dt' and 'Land area' in t.text)
which essentially does the same search with the filter encapsulated in a lambda
function.