How to extract tables from websites in Python

后端 未结 6 863
无人及你
无人及你 2020-12-04 18:18

Here,

http://www.ffiec.gov/census/report.aspx?year=2011&state=01&report=demographic&msa=11500

There is a table. My goal is to

6条回答
  •  不知归路
    2020-12-04 18:58

    So essentially you want to parse out html file to get elements out of it. You can use BeautifulSoup or lxml for this task.

    You already have solutions using BeautifulSoup. I'll post a solution using lxml:

    from lxml import etree
    import urllib
    
    web = urllib.request.urlopen("http://www.ffiec.gov/census/report.aspx?year=2011&state=01&report=demographic&msa=11500")
    s = web.read()
    
    html = etree.HTML(s)
    
    ## Get all 'tr'
    tr_nodes = html.xpath('//table[@id="Report1_dgReportDemographic"]/tr')
    
    ## 'th' is inside first 'tr'
    header = [i[0].text for i in tr_nodes[0].xpath("th")]
    
    ## Get text from rest all 'tr'
    td_content = [[td.text for td in tr.xpath('td')] for tr in tr_nodes[1:]]
    

提交回复
热议问题