Scraping websites with Javascript enabled?

前端 未结 6 591
北海茫月
北海茫月 2020-12-08 05:28

I\'m trying to scrape and submit information to websites that heavily rely on Javascript to do most of its actions. The website won\'t even work when i disable Javascript in

6条回答
  •  情话喂你
    2020-12-08 06:23

    You should look into using Ghost, a Python library that wraps the PyQt4 + WebKit hack.

    This makes g the WebKit client:

    import ghost
    g = ghost.Ghost()
    

    You can grab a page with g.open(url) and then g.content will evaluate to the document in its current state.

    Ghost has other cool features, like injecting JS and some form filling methods, and you can pass the resulting document to BeautifulSoup and so on: soup = bs4.BeautifulSoup(g.content).

    So far, Ghost is the only thing I've found that makes this kind of thing easy in Python. The only limitation I've come across is that you can't easily create more than one instance of the client object, ghost.Ghost, but you could work around that.

提交回复
热议问题