Using Python requests.get to parse html code that does not load at once

后端未结

关注

 2  1560

爱一瞬间的悲伤 2020-12-21 05:48

I am trying to write a Python script that will periodically check a website to see if an item is available. I have used requests.get, lxml.html, and xpath successfully in

2条回答

暖寄归人 (楼主)

2020-12-21 06:15
You are not correct in your assessment of the problem.

You can check the results and see that there's a right near the end. That means you've got the whole page.

And requests.text always grabs the whole page; if you want to stream it a bit at a time, you have to do so explicitly.

Your problem is that the table doesn't actually exist in the HTML; it's build dynamically by client-side JavaScript. You can see that by actually reading the HTML that's returned. So, unless you run that JavaScript, you don't have the information.

There are a number of general solutions to that. For example:
- Use selenium or similar to drive an actual browser to download the page.
- Manually work out what the JavaScript code does and do equivalent work in Python.
- Run a headless JavaScript interpreter against a DOM that you've built up.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...