问题
I'm trying to scrap Year & Winners ( first & second columns ) from "List of finals matches" table (second table) from http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals: I'm using the code below:
import urllib2
from BeautifulSoup import BeautifulSoup
url = "http://www.samhsa.gov/data/NSDUH/2k10State/NSDUHsae2010/NSDUHsaeAppC2010.htm"
soup = BeautifulSoup(urllib2.urlopen(url).read())
soup.findAll('table')[0].tbody.findAll('tr')
for row in soup.findAll('table')[0].tbody.findAll('tr'):
first_column = row.findAll('th')[0].contents
third_column = row.findAll('td')[2].contents
print first_column, third_column
With the above code, I was able to get first & thrid column just fine. But when I use the same code with http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals, It could not find tbody as its element, but I can see the tbody when I inspect the element.
url = "http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals"
soup = BeautifulSoup(urllib2.urlopen(url).read())
print soup.findAll('table')[2]
soup.findAll('table')[2].tbody.findAll('tr')
for row in soup.findAll('table')[0].tbody.findAll('tr'):
first_column = row.findAll('th')[0].contents
third_column = row.findAll('td')[2].contents
print first_column, third_column
Here's what I got from comment error:
'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-150-fedd08c6da16> in <module>()
7 # print soup.findAll('table')[2]
8
----> 9 soup.findAll('table')[2].tbody.findAll('tr')
10 for row in soup.findAll('table')[0].tbody.findAll('tr'):
11 first_column = row.findAll('th')[0].contents
AttributeError: 'NoneType' object has no attribute 'findAll'
'
回答1:
If you are inspecting through the inspect tool in the browser it will insert the tbody tags.
The source code, may, or may not contain them. I suggest looking at the source view if you really want to know.
Either way, you do not need to traverse to the tbody, simply:
soup.findAll('table')[0].findAll('tr') should work.
回答2:
url = "http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals"
soup = BeautifulSoup(urllib2.urlopen(url).read())
for tr in soup.findAll('table')[2].findAll('tr'):
#get data
And then search what you need in the table :)
回答3:
Directly run the below code.
tr_elements = soup.find_all('table')[2].find_all('tr')
By doing this, you can access the all the <tr>; You will have to use for loop for doing this (There are other possible ways to iterate too). Don't try to find the tbody, it gets added by default.
Note:
If you are having a problem in getting to the desired tag, decompose the previous tags with .decompose() method.
来源:https://stackoverflow.com/questions/20522820/how-to-get-tbody-from-table-from-python-beautiful-soup