web scraping dynamic content with python

前端 未结 3 1760
暖寄归人
暖寄归人 2020-11-27 16:44

I\'d like to use Python to scrape the contents of the \"Were you looking for these authors:\" box on web pages like this one: http://academic.research.microsoft.com/Search?q

3条回答
  •  没有蜡笔的小新
    2020-11-27 17:30

    Instead of trying to reverse engineer it, you can use ghost.py to directly interact with JavaScript on the page.

    If you run the following query in a chrome console, you'll see it returns everything you want.

    document.getElementsByClassName('inline-text-org');
    

    Returns

    [
    ​University of Manchester​
    ,
    ​University of California ...​
    ​ etc...

    You can run JavaScript through python in a real life DOM using ghost.py.

    This is really cool:

    from ghost import Ghost
    ghost = Ghost()
    page, resources = ghost.open('http://academic.research.microsoft.com/Search?query=lander')
    result, resources = ghost.evaluate(
        "document.getElementsByClassName('inline-text-org');")
    

提交回复
热议问题