Beautifulsoup not returning complete HTML of the page

前端 未结 1 1237
心在旅途
心在旅途 2020-12-16 08:15

I have been digging on the site for some time and im unable to find the solution to my issue. Im fairly new to web scraping and trying to simply extract some links from a

相关标签:
1条回答
  • 2020-12-16 09:01

    The page use JS to load the data dynamically so you have to use selenium. Check below code. Note you have to install selenium and chromedrive (unzip the file and copy into python folder)

    import time
    from bs4 import BeautifulSoup
    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    
    url = "https://www.sofascore.com/pt/futebol/2018-09-18"
    options = Options()
    options.add_argument('--headless')
    options.add_argument('--disable-gpu')
    driver = webdriver.Chrome(chrome_options=options)
    driver.get(url)
    time.sleep(3)
    page = driver.page_source
    driver.quit()
    soup = BeautifulSoup(page, 'html.parser')
    container = soup.find_all('div', attrs={
        'class':'js-event-list-tournament-events'})
    print(container)
    

    or you can use their json api

    import requests
    url = 'https://www.sofascore.com/football//2018-09-18/json'
    r = requests.get(url)
    print(r.json())
    
    0 讨论(0)
提交回复
热议问题