After starting My Scraper I do not get an output

时间秒杀一切 提交于 2019-12-24 14:16:22

问题


I am running a scraper to retrieve Product name, Cat No, Size and Price but when I run the script it doesn't give me an output or an error message. I am using Jupyter Notebook for this and not sure if that is the problem. I am also not sure if because I am imputing this into a CSV file if this is also giving it issues. Any help would be greatly appreciated.

This is the code that I am running.

from selenium import webdriver
import csv, os
from bs4 import BeautifulSoup

os.chdir(r'C:\Users\kevin.cragin\AppData\Local\pip\Cache\wheels\09\14\7d\1dcfcf0fa23dbb52fc459e5ce620000e7dca7aebd9300228fe') 
driver = webdriver.Chrome()
driver.get('https://www.biolegend.com/en-us/advanced-search?GroupID=&PageNum=1')
html = driver.page_source

containers = html.find_all('li', {'class': 'row list'})

with open("BioLegend_Crawl.csv", "w") as f:

    f.write("Product_name, CatNo, Size, Price\n")

    for container in containers:

        product_name = container.find('a',{'itemprop':'name'}).text
        info = container.find_all('div',{'class':'col-xs-2 noPadding'})
        catNo = info[0].text.strip()
        size = info[1].text.strip()
        price = info[2].text.strip()

        print('Product_name: '+ product_name)
        print('CatNo: ' + catNo)
        print('Size: ' + size)
        print('Price: ' + price + '\n')

        f.write(','.join([product_name,catNo,size,price]))

回答1:


Well the website you are using is technically loading information from a database, therefore it is not preset in the website HTML which product names are loaded by default. They must be loaded dynamically based on search constraints.

So you will need to download chromedriver.exe (if you use Google Chrome) or some other driver that automates your web browser (PhantomJS is another good one), then you will need to specify the path location on your machine to where this .exe lives, like so:

import selenium import webdriver
import csv, os
from bs4 import BeautifulSoup

os.chdir('Path to chromedriver or other driver') 
driver = webdriver.Chrome()
driver.get('Link to your webpage you want to extract HTML from')
html = driver.page_source
soup = BeautifulSoup(html)

containers = soup.find_all('ul',{'id':'productsHolder'})

with open("BioLegend_Crawl.csv", "w") as f:

    f.write("Product_name, CatNo, Size, Price\n")

    for container in containers:

        product_name = container.find('a',{'itemprop':'name'}).text
        info = container.find_all('div',{'class':'col-xs-2 noPadding'})
        catNo = info[0].text.strip()
        size = info[1].text.strip()
        price = info[2].text.strip()

        print('Product_name: '+ product_name)
        print('CatNo: ' + catNo)
        print('Size: ' + size)
        print('Price: ' + price + '\n')

        f.write(','.join([product_name,catNo,size,price]))


来源:https://stackoverflow.com/questions/49120065/after-starting-my-scraper-i-do-not-get-an-output

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!