Scrape web page data generated by javascript

好久不见. 提交于 2019-11-26 11:55:28

You need to look at PhantomJS.

From their site:

PhantomJS is a headless WebKit with JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.

Using the API you can script the "browser" to interact with that page and scrape the data you need. You can then do whatever you need with it; including passing it to a PHP script if necessary.


That being said, if at all possible try not to "scrape" the data. If there is an ajax call the page is making, maybe there is an API you can use instead? If not, maybe you can convince them to make one. That would of course be much easier and more maintainable than screen scraping.

First, you need PhantomJS:

Second, you need PHP phantomjs:

  1. install composer (if it is not exist on your server)
  2. install package (PHP phantomjs), you might have a look on this guide:

https://github.com/jonnnnyw/php-phantomjs http://jonnnnyw.github.io/php-phantomjs/4.0/2-installation/

Third, Load the package to your script: require ('vendor/autoload.php');

Finally, instead of file_get_content, you will load the page via phantomjs

$client = Client::getInstance();
    $client->getEngine()->setPath('/usr/local/bin/phantomjs');


    $client = Client::getInstance();

    $request  = $client->getMessageFactory()->createRequest();
    $response = $client->getMessageFactory()->createResponse();

    $request->setMethod('GET');
    $request->setUrl('https://www.your_page_embeded_ajax_request');

    $client->send($request, $response);

    if($response->getStatus() === 200) {
        echo "Do something here";
    }
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!