Extracting data from HTML table

前端 未结 7 773
迷失自我
迷失自我 2020-12-04 15:41

I am looking for a way to get certain info from HTML in linux shell environment.

This is bit that I\'m interested in :

7条回答
  •  误落风尘
    2020-12-04 16:20

    A Python solution using BeautifulSoup4 (Edit: with proper skipping. Edit3: Using class="details" to select the table):

    from bs4 import BeautifulSoup
    
    html = """
      
Tests Failures Success Rate Average Time Min Time Max Time
103 24 76.70% 71 ms 0 ms 829 ms
""" soup = BeautifulSoup(html) table = soup.find("table", attrs={"class":"details"}) # The first tr contains the field names. headings = [th.get_text() for th in table.find("tr").find_all("th")] datasets = [] for row in table.find_all("tr")[1:]: dataset = zip(headings, (td.get_text() for td in row.find_all("td"))) datasets.append(dataset) print datasets

The result looks like this:

[[(u'Tests', u'103'),
  (u'Failures', u'24'),
  (u'Success Rate', u'76.70%'),
  (u'Average Time', u'71 ms'),
  (u'Min Time', u'0 ms'),
  (u'Max Time', u'829 ms')]]

Edit2: To produce the desired output, use something like this:

for dataset in datasets:
    for field in dataset:
        print "{0:<16}: {1}".format(field[0], field[1])

Result:

Tests           : 103
Failures        : 24
Success Rate    : 76.70%
Average Time    : 71 ms
Min Time        : 0 ms
Max Time        : 829 ms

提交回复
热议问题