Wait page to load before getting data with requests.get in python 3

后端未结

关注

 4  892

陌清茗 2020-11-28 10:40

I have a page that i need to get the source to use with BS4, but the middle of the page takes 1 second(maybe less) to load the content, and requests.get catches the source o

4条回答

生来不讨喜 (楼主)

2020-11-28 11:05
It doesn't look like a problem of waiting, it looks like the element is being created by JavaScript, requests can't handle dynamically generated elements by JavaScript. A suggestion is to use selenium together with PhantomJS to get the page source, then you can use BeautifulSoup for your parsing, the code shown below will do exactly that:
```
from bs4 import BeautifulSoup
from selenium import webdriver

url = "http://legendas.tv/busca/walking%20dead%20s03e02"
browser = webdriver.PhantomJS()
browser.get(url)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
a = soup.find('section', 'wrapper')
```
Also, there's no need to use .findAll if you are only looking for one element only.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...