I use a tool at work that lets me do queries and get back HTML tables of info. I do not have any kind of back-end access to it.
A lot of this inf
This is my python version using the (currently) latest version of BeautifulSoup which can be obtained using, e.g.,
$ sudo easy_install beautifulsoup4
The script reads HTML from the standard input, and outputs the text found in all tables in proper CSV format.
#!/usr/bin/python
from bs4 import BeautifulSoup
import sys
import re
import csv
def cell_text(cell):
return " ".join(cell.stripped_strings)
soup = BeautifulSoup(sys.stdin.read())
output = csv.writer(sys.stdout)
for table in soup.find_all('table'):
for row in table.find_all('tr'):
col = map(cell_text, row.find_all(re.compile('t[dh]')))
output.writerow(col)
output.writerow([])