Wait page to load before getting data with requests.get in python 3

后端 未结 4 892
陌清茗
陌清茗 2020-11-28 10:40

I have a page that i need to get the source to use with BS4, but the middle of the page takes 1 second(maybe less) to load the content, and requests.get catches the source o

4条回答
  •  生来不讨喜
    2020-11-28 11:05

    It doesn't look like a problem of waiting, it looks like the element is being created by JavaScript, requests can't handle dynamically generated elements by JavaScript. A suggestion is to use selenium together with PhantomJS to get the page source, then you can use BeautifulSoup for your parsing, the code shown below will do exactly that:

    from bs4 import BeautifulSoup
    from selenium import webdriver
    
    url = "http://legendas.tv/busca/walking%20dead%20s03e02"
    browser = webdriver.PhantomJS()
    browser.get(url)
    html = browser.page_source
    soup = BeautifulSoup(html, 'lxml')
    a = soup.find('section', 'wrapper')
    

    Also, there's no need to use .findAll if you are only looking for one element only.

提交回复
热议问题