Extracting selected columns from a table using BeautifulSoup

前端未结

关注

 3  1469

情话喂你 2020-12-05 15:48

I am trying to extract the first and third columns of this data table using BeautifulSoup. From looking at the HTML the first column has a tag. The o

3条回答

醉梦人生 (楼主)

2020-12-05 16:22

In addition to @jonhkr's answer I thought I'd post an alternate solution I came up with.

 #!/usr/bin/python

 from BeautifulSoup import BeautifulSoup
 from sys import argv

 filename = argv[1]
 #get HTML file as a string
 html_doc = ''.join(open(filename,'r').readlines())
 soup = BeautifulSoup(html_doc)
 table = soup.findAll('table')[0].tbody

 data = map(lambda x: (x.findAll(text=True)[1],x.findAll(text=True)[5]),table.findAll('tr'))
 print data

Unlike jonhkr's answer, which dials into the webpage, mine assumes that you have it save on your computer and pass it as a command line argument. For example:

python file.py table.html

0 讨论(0)

查看其它3个回答