HTML table to pandas table: Info inside html tags

后端未结

关注

 3  1232

轮回少年 2021-01-04 21:02

I have a large table from the web, accessed via requests and parsed with BeautifulSoup. Part of it looks something like this:


&l         

        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   借酒劲吻你
                                             
                
                
                (楼主)
            
              
              
                2021-01-04 21:53
              

            
            
                        
You could simply parse the table manually like this:

import BeautifulSoup
import pandas as pd

TABLE = """



        
      
      
      



265
JonesBlue
29


266
Smith
34


"""

table = BeautifulSoup.BeautifulSoup(TABLE)
records = []
for tr in table.findAll("tr"):
    trs = tr.findAll("td")
    record = []
    record.append(trs[0].text)
    record.append(trs[1].a["href"])
    record.append(trs[2].text)
    records.append(record)

df = pd.DataFrame(data=records)
df



which gives you

     0                 1   2
0  265  /j/jones03.shtml  29
1  266  /s/smith01.shtml  34