问题
I was building a web scraper by using python selenium
. The script scraped sites like amazon, stack overflow and flipcart but wasn't able to scrape ofashion. It is always returning me a blank .csv file.
Here is my code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import pandas as pd
import time
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) ' \
'Chrome/80.0.3987.132 Safari/537.36'
driver_exe = 'chromedriver'
options = Options()
#options.add_argument("--headless")
options.add_argument(f'user-agent={user_agent}')
options.add_argument("--disable-web-security")
options.add_argument("--allow-running-insecure-content")
options.add_argument("--allow-cross-origin-auth-prompt")
driver = webdriver.Chrome(executable_path=r"C:\Users\intel\Downloads\Setups\chromedriver.exe", options=options)
driver.get("https://www.ofashion.com.cn/goods/10001?t=15777838840003")
class_Name = "." + "ellipsis-single ware-brand"
x = driver.find_elements_by_css_selector(class_Name.replace(' ','.'))
web_content_list = []
for i in x:
web_content_dict = {}
web_content_dict["Title"] = i.text
web_content_list.append(web_content_dict)
df = pd.DataFrame(web_content_list)
df.to_csv(r'C:\Users\intel\Desktop\data_file.csv',
index=False, mode='a', encoding='utf-8')
Any help would be appreciated!
回答1:
This is because the website is loaded through javascript. You see that loading sign (with the clothes hanger)? If you look at the top of the tab, you can see that the page isn't loading anymore. To wait for it to load completely, you can use Selenium's Wait Methods.
NOTE: Please put driver.close()
at the end of your code to close the chromedriver
window properly.
回答2:
You should check out BeautifulSoup, for me at least I've never enjoyed scraping with selenium. I'd encourage you check out requests or BeautifulSoup for webscraping.
from bs4 import BeautifulSoup
import requests
HEADERS = requests.utils.default_headers()
HEADERS.update({
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0',
})
url = "https://www.ofashion.com.cn/goods/10001?t=15777838840003"
load = requests.get(url, headers=HEADERS)
page_content = load.content
soup = BeautifulSoup(page_content,'lxml')
print(soup)
I wrote a sample script for you as above, this scrapes the page for all its text, this is just something I quickly wrote up however it scrapes your page. I'd encourage you read more into BS4 and requests. Ill also link some previous web scraper projects I've made in case you want a reference.
BS4 Docs = https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Request Docs = https://2.python-requests.org/en/v1.1.0/api
Some Webscrapers I've Written = https://github.com/backslash/WebScrapers/
来源:https://stackoverflow.com/questions/60573846/selenium-code-is-not-able-to-scrape-ofashion-com-cn