问题
I have the following html code and i use beautiful soup to extract information. I want to get for example Relationship status: Relationship
<table class="box-content-list" cellspacing="0">
<tbody>
<tr class="first">
<td>
<strong>
Relationship status:
</strong>
Relationship
</td>
</tr>
<tr class="alt">
<td>
<strong>
Living:
</strong>
With partner
</td>
</tr>
I have created the following code:
xs = [x for x in soup.findAll('table', attrs = {'class':'box-content-list'})]
for x in xs:
#print x
sx = [s for s in x.findAll('tr',attrs={'class':'first'})]
for s in sx:
td_tabs = [td for td in s.findAll('td')]
for td in td_tabs:
title = td.findNext('strong')
#print str(td)
status = td.findNextSibling()
print title.string
print status
but the result i get is Relations status: and the print status is printing None. What i am doing wrong?
回答1:
There is a special method get_text (or getText in old BeautifulSoup versions) to get the content of intricated tags. With your example:
>>> example.td.get_text(' ', strip=True)
'Relationship status: Relationship'
The first parameter is the separator to use.
回答2:
First of all, there is no need for all the list comprehensions; yours do nothing but copy the results, you can safely do without them.
There is no next sibling in your column (there is only one <td> tag), so it returns None. You wanted to get the .next attribute from the title (the <strong> tag) instead:
for table in soup.findAll('table', attrs = {'class':'box-content-list'}):
for row in table.findAll('tr',attrs={'class':'first'}):
for col in row.findAll('td'):
title = col.strong
status = title.nextSibling
print title.text.strip(), status.strip()
which prints:
Relationship status: Relationship
for your example.
来源:https://stackoverflow.com/questions/15968518/beautiful-soup-returns-none