BeautifulSoup can't find class that exists on webpage?

你离开我真会死。 提交于 2019-12-23 09:27:00

问题


So I am trying to scrape the following webpage https://www.scoreboard.com/uk/football/england/premier-league/,

Specifically the scheduled and finished results. Thus I am trying to look for the elements with class = "stage-finished" or "stage-scheduled". However when I scrape the webpage and print out what page_soup contains, it doesn't contain these elements.

I found another SO question with an answer saying that this is because it is loaded via AJAX and I need to look at the XHR under the network tab on chrome dev tools to find the file thats loading the necessary data, however it doesn't seem to be there?

import bs4
import requests
from bs4 import BeautifulSoup as soup
import csv
import datetime

myurl = "https://www.scoreboard.com/uk/football/england/premier-league/"
headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}
page = requests.get(myurl, headers=headers)

page_soup = soup(page.content, "html.parser")

scheduled = page_soup.select(".stage-scheduled")
finished = page_soup.select(".stage-finished")
live = page_soup.select(".stage-live")
print(page_soup)
print(scheduled[0])

The above code throws an error of course as there is no content in the scheduled array.

My question is, how do I go about getting the data I'm looking for?

I copied the contents of the XHR files to a notepad and searched for stage-finished and other tags and found nothing. Am I missing something easy here?


回答1:


The page is JavaScript rendered. You need Selenium. Here is some code to start on:

from selenium import webdriver

url = 'https://www.scoreboard.com/uk/football/england/premier-league/'

driver = webdriver.Chrome()
driver.get(url)
stages = driver.find_elements_by_class_name('stage-scheduled')
driver.close()

Or you could pass driver.content in to the BeautifulSoup method. Like this:

soup = BeautifulSoup(driver.page_source, 'html.parser')

Note: You need to install a webdriver first. I installed chromedriver.

Good luck!



来源:https://stackoverflow.com/questions/52408701/beautifulsoup-cant-find-class-that-exists-on-webpage

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!