问题
I am running a scraper to retrieve Product name, Cat No, Size and Price but when I run the script it doesn't give me an output or an error message. I am using Jupyter Notebook for this and not sure if that is the problem. I am also not sure if because I am imputing this into a CSV file if this is also giving it issues. Any help would be greatly appreciated.
This is the code that I am running.
from selenium import webdriver
import csv, os
from bs4 import BeautifulSoup
os.chdir(r'C:\Users\kevin.cragin\AppData\Local\pip\Cache\wheels\09\14\7d\1dcfcf0fa23dbb52fc459e5ce620000e7dca7aebd9300228fe')
driver = webdriver.Chrome()
driver.get('https://www.biolegend.com/en-us/advanced-search?GroupID=&PageNum=1')
html = driver.page_source
containers = html.find_all('li', {'class': 'row list'})
with open("BioLegend_Crawl.csv", "w") as f:
f.write("Product_name, CatNo, Size, Price\n")
for container in containers:
product_name = container.find('a',{'itemprop':'name'}).text
info = container.find_all('div',{'class':'col-xs-2 noPadding'})
catNo = info[0].text.strip()
size = info[1].text.strip()
price = info[2].text.strip()
print('Product_name: '+ product_name)
print('CatNo: ' + catNo)
print('Size: ' + size)
print('Price: ' + price + '\n')
f.write(','.join([product_name,catNo,size,price]))
回答1:
Well the website you are using is technically loading information from a database, therefore it is not preset in the website HTML which product names are loaded by default. They must be loaded dynamically based on search constraints.
So you will need to download chromedriver.exe (if you use Google Chrome) or some other driver that automates your web browser (PhantomJS is another good one), then you will need to specify the path location on your machine to where this .exe lives, like so:
import selenium import webdriver
import csv, os
from bs4 import BeautifulSoup
os.chdir('Path to chromedriver or other driver')
driver = webdriver.Chrome()
driver.get('Link to your webpage you want to extract HTML from')
html = driver.page_source
soup = BeautifulSoup(html)
containers = soup.find_all('ul',{'id':'productsHolder'})
with open("BioLegend_Crawl.csv", "w") as f:
f.write("Product_name, CatNo, Size, Price\n")
for container in containers:
product_name = container.find('a',{'itemprop':'name'}).text
info = container.find_all('div',{'class':'col-xs-2 noPadding'})
catNo = info[0].text.strip()
size = info[1].text.strip()
price = info[2].text.strip()
print('Product_name: '+ product_name)
print('CatNo: ' + catNo)
print('Size: ' + size)
print('Price: ' + price + '\n')
f.write(','.join([product_name,catNo,size,price]))
来源:https://stackoverflow.com/questions/49120065/after-starting-my-scraper-i-do-not-get-an-output