问题
I am trying to parse some tables from a wiki page e.g. http://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2014. there are four tables with same class name "wikitable". When I write:
movieList= soup.find('table',{'class':'wikitable'})
rows = movieList.findAll('tr')
It works fine, but when I write:
movieList= soup.findAll('table',{'class':'wikitable'})
rows = movieList.findAll('tr')
It throws an error:
Traceback (most recent call last):
File "C:\Python27\movieList.py", line 24, in <module>
rows = movieList.findAll('tr')
AttributeError: 'ResultSet' object has no attribute 'findAll'
when I print movieList it prints all four table.
Also, how can I parse the content effectively because the no. of columns in a row is variable? I want to store this information into different variables.
回答1:
findAll()
returns a ResultSet
object - basically, a list of elements. If you want to find elements inside each of the element in the ResultSet
- use a loop:
movie_list = soup.findAll('table', {'class': 'wikitable'})
for movie in movie_list:
rows = movie.findAll('tr')
...
You could have also used a CSS Selector, but, in this case, it would not be easy to distinguish rows between movies:
rows = soup.select('table.wikitable tr')
As a bonus, here is how you can collect all of the "Releases" into a dictionary where the keys are the periods and the values are lists of movies:
from pprint import pprint
import urllib2
from bs4 import BeautifulSoup
url = 'http://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2014'
soup = BeautifulSoup(urllib2.urlopen(url))
headers = ['Opening', 'Title', 'Genre', 'Director', 'Cast']
results = {}
for block in soup.select('div#mw-content-text > h3'):
title = block.find('span', class_='mw-headline').text
rows = block.find_next_sibling('table', class_='wikitable').find_all('tr')
results[title] = [{header: td.text for header, td in zip(headers, row.find_all('td'))}
for row in rows[1:]]
pprint(results)
This should get you much closer to solving the problem.
来源:https://stackoverflow.com/questions/28291700/python-beautifulsoup-parsing-multiple-tables-with-same-class-name