Extracting selected columns from a table using BeautifulSoup

前端 未结 3 1464
情话喂你
情话喂你 2020-12-05 15:48

I am trying to extract the first and third columns of this data table using BeautifulSoup. From looking at the HTML the first column has a tag. The o

3条回答
  •  醉梦人生
    2020-12-05 16:22

    In addition to @jonhkr's answer I thought I'd post an alternate solution I came up with.

     #!/usr/bin/python
    
     from BeautifulSoup import BeautifulSoup
     from sys import argv
    
     filename = argv[1]
     #get HTML file as a string
     html_doc = ''.join(open(filename,'r').readlines())
     soup = BeautifulSoup(html_doc)
     table = soup.findAll('table')[0].tbody
    
     data = map(lambda x: (x.findAll(text=True)[1],x.findAll(text=True)[5]),table.findAll('tr'))
     print data
    

    Unlike jonhkr's answer, which dials into the webpage, mine assumes that you have it save on your computer and pass it as a command line argument. For example:

    python file.py table.html 
    

提交回复
热议问题