get renderd javascript lines from website in python

不羁岁月 提交于 2020-01-25 09:38:05

问题


I'm using python 3.6.6 for this.

I'm trying to get the current versionnumber of pycharm from the pycharm website (https://www.jetbrains.com/pycharm/download/#section=windows). The versionnumber is displayed pretty obvious, still I can't get it because I don't know how to process java script properly.

I tried parsing it out with requests_html from:

<li>Version: <span data-code="PCP" data-release-version=""></span></li>

This part should look like this after java script has done its job:

<li>Version: <span data-code="PCP" data-release-version="">2018.1.4</span></li>

Here is my not working script by the way:

from requests_html import HTMLSession

session = HTMLSession()
r = session.get('https://www.jetbrains.com/pycharm/download/#section=windows')


r.html.render()
item = r.html.find('<span data-code="PCP" data-release-version=""></span>')


print(item)

I don't care if there would be any parts left over, I would simply filter them out with RegEx. Still the only thing I'm getting from this is:

[<Element 'span' data-code='PCP' data-release-version=''>]

回答1:


update:

I found an solution my self. It seems like render() is in need for sleep. Also I used xpath instead of search.

from requests_html import HTMLSession

session = HTMLSession()
r = session.get('https://www.jetbrains.com/pycharm/download/#section=windows')


r.html.render(sleep=0.1)
item = r.html.xpath('/html/body/div[1]/div[2]/div/div[2]/div[1]/div[2]/ul[1]/li[1]/span/text()')

print('------------------------------------------------')
print(item)

my Result:

['2018.1.4']


来源:https://stackoverflow.com/questions/51403755/get-renderd-javascript-lines-from-website-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!