From a large table I want to read rows 5, 10, 15, 20 ... using BeautifulSoup. How do I do this? Is findNextSibling and an incrementing counter the way to go?
As a general solution, you can convert the table to a nested list and iterate...
import BeautifulSoup
def listify(table):
"""Convert an html table to a nested list"""
result = []
rows = table.findAll('tr')
for row in rows:
result.append([])
cols = row.findAll('td')
for col in cols:
strings = [_string.encode('utf8') for _string in col.findAll(text=True)]
text = ''.join(strings)
result[-1].append(text)
return result
if __name__=="__main__":
"""Build a small table with one column and ten rows, then parse into a list"""
htstring = """ foo1 foo2 foo3 foo4 foo5 foo6 foo7 foo8 foo9 foo10
"""
soup = BeautifulSoup.BeautifulSoup(htstring)
for idx, ii in enumerate(listify(soup)):
if ((idx+1)%5>0):
continue
print ii
Running that...
[mpenning@Bucksnort ~]$ python testme.py
['foo5']
['foo10']
[mpenning@Bucksnort ~]$