Beautifulsoup not returning complete HTML of the page

前端 未结 1 1236
心在旅途
心在旅途 2020-12-16 08:15

I have been digging on the site for some time and im unable to find the solution to my issue. Im fairly new to web scraping and trying to simply extract some links from a

1条回答
  •  爱一瞬间的悲伤
    2020-12-16 09:01

    The page use JS to load the data dynamically so you have to use selenium. Check below code. Note you have to install selenium and chromedrive (unzip the file and copy into python folder)

    import time
    from bs4 import BeautifulSoup
    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    
    url = "https://www.sofascore.com/pt/futebol/2018-09-18"
    options = Options()
    options.add_argument('--headless')
    options.add_argument('--disable-gpu')
    driver = webdriver.Chrome(chrome_options=options)
    driver.get(url)
    time.sleep(3)
    page = driver.page_source
    driver.quit()
    soup = BeautifulSoup(page, 'html.parser')
    container = soup.find_all('div', attrs={
        'class':'js-event-list-tournament-events'})
    print(container)
    

    or you can use their json api

    import requests
    url = 'https://www.sofascore.com/football//2018-09-18/json'
    r = requests.get(url)
    print(r.json())
    

    0 讨论(0)
提交回复
热议问题