Extracting data from HTML table

前端未结

关注

 7  773

迷失自我 2020-12-04 15:41

I am looking for a way to get certain info from HTML in linux shell environment.

This is bit that I\'m interested in :


        
          7条回答        

        
                    
            
            
                         
                
              
              
                
                   误落风尘
                                             
                
                
                (楼主)
            
              
              
                2020-12-04 16:20
              

            
            
                        
A Python solution using BeautifulSoup4 (Edit: with proper skipping. Edit3: Using class="details" to select the table):

from bs4 import BeautifulSoup

html = """
  


        
      
      
      

    
      Tests
      Failures
      Success Rate
      Average Time
      Min Time
      Max Time
   
   
     103
     24
     76.70%
     71 ms
     0 ms
     829 ms
  
"""

soup = BeautifulSoup(html)
table = soup.find("table", attrs={"class":"details"})

# The first tr contains the field names.
headings = [th.get_text() for th in table.find("tr").find_all("th")]

datasets = []
for row in table.find_all("tr")[1:]:
    dataset = zip(headings, (td.get_text() for td in row.find_all("td")))
    datasets.append(dataset)

print datasets

Tests	Failures	Success Rate	Average Time	Min Time	Max Time
103	24	76.70%	71 ms	0 ms	829 ms



The result looks like this:

[[(u'Tests', u'103'),
  (u'Failures', u'24'),
  (u'Success Rate', u'76.70%'),
  (u'Average Time', u'71 ms'),
  (u'Min Time', u'0 ms'),
  (u'Max Time', u'829 ms')]]


Edit2: To produce the desired output, use something like this:

for dataset in datasets:
    for field in dataset:
        print "{0:<16}: {1}".format(field[0], field[1])


Result:

Tests           : 103
Failures        : 24
Success Rate    : 76.70%
Average Time    : 71 ms
Min Time        : 0 ms
Max Time        : 829 ms