How can I scrape an HTML table to CSV?

后端 未结 11 1346
悲&欢浪女
悲&欢浪女 2020-11-29 21:56

The Problem

I use a tool at work that lets me do queries and get back HTML tables of info. I do not have any kind of back-end access to it.

A lot of this inf

11条回答
  •  日久生厌
    2020-11-29 22:21

    This is my python version using the (currently) latest version of BeautifulSoup which can be obtained using, e.g.,

    $ sudo easy_install beautifulsoup4
    

    The script reads HTML from the standard input, and outputs the text found in all tables in proper CSV format.

    #!/usr/bin/python
    from bs4 import BeautifulSoup
    import sys
    import re
    import csv
    
    def cell_text(cell):
        return " ".join(cell.stripped_strings)
    
    soup = BeautifulSoup(sys.stdin.read())
    output = csv.writer(sys.stdout)
    
    for table in soup.find_all('table'):
        for row in table.find_all('tr'):
            col = map(cell_text, row.find_all(re.compile('t[dh]')))
            output.writerow(col)
        output.writerow([])
    

提交回复
热议问题