问题
Code
#!/usr/bin/env python3
from bs4 import BeautifulSoup
test="""<!DOCTYPE html>
<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/>
<title>Test</title>
</head>
<body>
<table>
<tbody>
<tr>
<td>
<div>
<b>
Icon
</b>
</div>
</td>
</tr>
</tbody>
</table>
</body>
</html>"""
soup = BeautifulSoup(test2)
rows = soup.findAll('tr')
for r in rows:
print(r.name)
for c in r.children:
print('>', c.name)
Output
tr
> None
> td
> None
Why are there nameless elements in the list of the row's children?
This occurs running Python 3.3.1 64-bit on Windows 8, with html.parser
(that's Python's built-in one).
回答1:
The elements of .children
can be NavigableStrings as well as Tags. In the case of your example, they're the whitespace before and after the td
element.
This variation on your code hopefully makes it clear:
>>> rows = soup.findAll('tr')
>>> for r in rows:
... print("row:", r.name)
... for c in r.children:
... print("---")
... print(type(c))
... print(repr(c))
...
row: tr
---
<class 'bs4.element.NavigableString'>
'\n'
---
<class 'bs4.element.Tag'>
<td>
<div>
<b>
Icon
</b>
</div>
</td>
---
<class 'bs4.element.NavigableString'>
'\n'
来源:https://stackoverflow.com/questions/18284524/why-does-beautifulsoup-children-contain-nameless-elements-as-well-as-the-expect