I am learning Python and BeautifulSoup to scrape data from the web, and read a HTML table. I can read it into Open Office and it says that it is Table #11.
It seems
This should be pretty straight forward if you have a chunk of HTML to parse with BeautifulSoup. The general idea is to navigate to your table using the findChildren
method, then you can get the text value inside the cell with the string
property.
>>> from BeautifulSoup import BeautifulSoup
>>>
>>> html = """
...
...
...
... column 1 column 2
... value 1 value 2
...
...
...
... """
>>>
>>> soup = BeautifulSoup(html)
>>> tables = soup.findChildren('table')
>>>
>>> # This will get the first (and only) table. Your page may have more.
>>> my_table = tables[0]
>>>
>>> # You can find children with multiple tags by passing a list of strings
>>> rows = my_table.findChildren(['th', 'tr'])
>>>
>>> for row in rows:
... cells = row.findChildren('td')
... for cell in cells:
... value = cell.string
... print("The value in this cell is %s" % value)
...
The value in this cell is column 1
The value in this cell is column 2
The value in this cell is value 1
The value in this cell is value 2
>>>