I\'m pretty much brand new to Python, but I\'m looking to build a webscraping tool that will rip data from an HTML table online and print it into a CSV in the same format.
Run the code and you will get your desired data from that table. To give it a go and extract the data from this very element, all you need to do is wrap the whole html element, which you have pasted above, within html=''' '''
import csv
from bs4 import BeautifulSoup
outfile = open("table_data.csv","w",newline='')
writer = csv.writer(outfile)
tree = BeautifulSoup(html,"lxml")
table_tag = tree.select("table")[0]
tab_data = [[item.text for item in row_data.select("th,td")]
for row_data in table_tag.select("tr")]
for data in tab_data:
writer.writerow(data)
print(' '.join(data))
I've tried to break the code into pieces to make you understand. What I did above is a nested for loop. Here is how it goes separately:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html,"lxml")
table = soup.find('table')
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.findAll(["th","td"]):
text = cell.text
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
for item in list_of_rows:
print(' '.join(item))
Result:
Date Open High Low Close Volume Market Cap
Sep 14, 2017 3875.37 3920.60 3153.86 3154.95 2,716,310,000 64,191,600,000
Sep 13, 2017 4131.98 3789.92 3882.59 2,219,410,000 68,432,200,000
Sep 12, 2017 4168.88 4344.65 4085.22 4130.81 1,864,530,000 69,033,400,000