Python - Web Scraping HTML table and printing to CSV

前端 未结 1 1971
灰色年华
灰色年华 2020-12-15 14:45

I\'m pretty much brand new to Python, but I\'m looking to build a webscraping tool that will rip data from an HTML table online and print it into a CSV in the same format.

相关标签:
1条回答
  • 2020-12-15 15:38

    Run the code and you will get your desired data from that table. To give it a go and extract the data from this very element, all you need to do is wrap the whole html element, which you have pasted above, within html=''' '''

    import csv
    from bs4 import BeautifulSoup
    
    outfile = open("table_data.csv","w",newline='')
    writer = csv.writer(outfile)
    
    tree = BeautifulSoup(html,"lxml")
    table_tag = tree.select("table")[0]
    tab_data = [[item.text for item in row_data.select("th,td")]
                    for row_data in table_tag.select("tr")]
    
    for data in tab_data:
        writer.writerow(data)
        print(' '.join(data))
    

    I've tried to break the code into pieces to make you understand. What I did above is a nested for loop. Here is how it goes separately:

    from bs4 import BeautifulSoup
    
    soup = BeautifulSoup(html,"lxml")
    table = soup.find('table')
    
    list_of_rows = []
    for row in table.findAll('tr'):
        list_of_cells = []
        for cell in row.findAll(["th","td"]):
            text = cell.text
            list_of_cells.append(text)
        list_of_rows.append(list_of_cells)
    
    for item in list_of_rows:
        print(' '.join(item))
    

    Result:

    Date Open High Low Close Volume Market Cap
    Sep 14, 2017 3875.37 3920.60 3153.86 3154.95 2,716,310,000 64,191,600,000
    Sep 13, 2017 4131.98 3789.92 3882.59 2,219,410,000 68,432,200,000
    Sep 12, 2017 4168.88 4344.65 4085.22 4130.81 1,864,530,000 69,033,400,000
    
    0 讨论(0)
提交回复
热议问题