Using Python requests.get to parse html code that does not load at once

后端 未结 2 1554
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-21 05:48

I am trying to write a Python script that will periodically check a website to see if an item is available. I have used requests.get, lxml.html, and xpath successfully in

2条回答
  •  暖寄归人
    2020-12-21 06:15

    You are not correct in your assessment of the problem.

    You can check the results and see that there's a right near the end. That means you've got the whole page.

    And requests.text always grabs the whole page; if you want to stream it a bit at a time, you have to do so explicitly.

    Your problem is that the table doesn't actually exist in the HTML; it's build dynamically by client-side JavaScript. You can see that by actually reading the HTML that's returned. So, unless you run that JavaScript, you don't have the information.

    There are a number of general solutions to that. For example:

    • Use selenium or similar to drive an actual browser to download the page.
    • Manually work out what the JavaScript code does and do equivalent work in Python.
    • Run a headless JavaScript interpreter against a DOM that you've built up.

提交回复
热议问题