python-requests-html

Requests-html: error while running on flask

本小妞迷上赌 提交于 2021-02-11 15:01:28
问题 I've prepared a script that was using requests-html which was working fine. I deployed it in the flask app and now it's giving me RuntimeError: There is no current event loop in thread 'Thread-3'. Here's the full error: Traceback (most recent call last): File "C:\Users\intel\AppData\Local\Programs\Python\Python38\Lib\site-packages\flask\app.py", line 2464, in __call__ return self.wsgi_app(environ, start_response) . . . File "C:\Users\intel\Desktop\One page\main.py", line 18, in hello_world r

requests-html and infinite scrolling

半城伤御伤魂 提交于 2021-02-10 05:50:55
问题 I'm checking a python library: requests-html. Looks interesting, easy and clear scraping. However, I'm not sure how to render a page with infinite scrolling. From their documentation I understand that I should render a page with special attribute (scrolldown). I'm trying but I do not know how exactly. I know how to use selenium to handle infinite scroll, but I wonder whether it is possible with requests-html. from requests_html import HTML, HTMLSession page1 = session.get(url1) page1.html

Webscraping Blockchain data seemingly embedded in Javascript through Python, is this even the right approach?

怎甘沉沦 提交于 2021-01-29 20:11:13
问题 I'm referencing this url: https://tracker.icon.foundation/block/29562412 If you scroll down to "Transactions", it shows 2 transactions with separate links, that's essentially what I'm trying to grab. If I try a simple pd.read_csv(url) command, it clearly omits the data I'm looking for, so I thought it might be JavaScript based and tried the following code instead: from requests_html import HTMLSession session = HTMLSession() r = session.get('https://tracker.icon.foundation/block/29562412') r

Python Requests_html: giving me Timeout Error

痞子三分冷 提交于 2020-12-13 03:37:22
问题 I'm trying to scrape headlines from medium.com by using this library called requests_html The code I'm using works well on other's PC but not mine. Here's what the original code looks like this: from requests_html import HTMLSession session = HTMLSession() r = session.get('https://medium.com/@daranept27') r.html.render() x = r.html.find('a.eg.bv') [print(elem.text) for elem in x] It gives me pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 8000 ms exceeded. Here's the full error:

Problem trying to scrap a JS web with requests-html (Python 3.6)

谁说胖子不能爱 提交于 2020-02-05 03:28:11
问题 I've passed the last week trying to scrap information from Epic Games Store webpage (https://www.epicgames.com/store/en-US/), I first tried using the Requests module, but I soon realized I needed a module which supports javascript webs. And that's what I'm trying now, but there is a problem... When I use "inspect element" on the page, everything's fine, but when I execute this: from requests_html import HTMLSession session = HTMLSession() r = session.get("https://www.epicgames.com/store/en-US

get renderd javascript lines from website in python

不羁岁月 提交于 2020-01-25 09:38:05
问题 I'm using python 3.6.6 for this. I'm trying to get the current versionnumber of pycharm from the pycharm website (https://www.jetbrains.com/pycharm/download/#section=windows). The versionnumber is displayed pretty obvious, still I can't get it because I don't know how to process java script properly. I tried parsing it out with requests_html from: <li>Version: <span data-code="PCP" data-release-version=""></span></li> This part should look like this after java script has done its job: <li

Scraping ASPX form and avoiding Selenium

老子叫甜甜 提交于 2020-01-25 09:02:27
问题 I asked previously (see here) how to scrape results from an ASPX form. The form renders the output in a new tab (by using the function window.open in JS). In my previous post, I wasn't making the correct POST request, and I solved that. The following code successfully retrieves the HTML code from the form with the correct request headers, and it's exactly equal to the POST response I see in the Chrome inspector. But (...) I can't retrieve the data. Once the user make the selections, a new pop

requests-html HTTPSConnectionPoolRead timed out

若如初见. 提交于 2020-01-16 13:31:45
问题 Trying to send a request to here using requests-html . Here is my code: headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"} session = HTMLSession() while True: try: r = session.get("https://www.size.co.uk/product/white-fila-v94m-low/119095/",headers=headers,timeout=40) r.html.render() print(r.html.text) except Exception as e: print(e) Here is the error I am receiving: HTTPSConnectionPool(host='www.size.co.uk',