What is the correct soup.find() command?

假装没事ソ 提交于 2021-01-28 12:02:06

问题


I am trying to webscrape the racename ('The Valley R2') and the horse name ('Ronniejay') from the following website https://www.punters.com.au/form-guide/form-finder/e2a0f7e13bf0057b4c156aea23019b18.

What is the correct soup.find() code to do this.

My code to get the race name:

from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.punters.com.au/form-guide/form-finder/e2a0f7e13bf0057b4c156aea23019b18').text
soup = BeautifulSoup(source,'lxml')
race = soup.find('h3')
print(race)

回答1:


The website uses JavaScript, but requests doesn't support it. We can use Selenium as an alternative to scrape the page.

Install it with: pip install selenium.

Download the correct ChromeDriver from here.

from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep

URL = "https://www.punters.com.au/form-guide/form-finder/e2a0f7e13bf0057b4c156aea23019b18"

driver = webdriver.Chrome(r"C:\path\to\chromedriver.exe")
driver.get(URL)
# Wait for page to fully render
sleep(5)

soup = BeautifulSoup(driver.page_source, "lxml")

race_name = soup.select_one(".form-result-group__event span").text
horse_name = "".join(
    x for x in soup.select_one(".form-result__competitor-name").text if x.isalpha()
)

print(race_name)
print(horse_name)

driver.quit()

Output:

The Valley R2
Ronniejay


来源:https://stackoverflow.com/questions/64139449/what-is-the-correct-soup-find-command

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!