php file_get_contents - AFTER javascript executes

耗尽温柔 提交于 2020-01-02 11:59:21

问题


basically, I am trying to scrape webpages with php but I want to do so after the initial javascript on a page executes - I want access to the DOM after initial ajax requests, etc... is there any way to do this?


回答1:


Short answer: no.

Scraping a site gives you whatever the server responds with to the HTTP request that you make (from which the "initial" state of the DOM tree is derived, if that content is HTML). It cannot take into account the "current" state of the DOM after it has been modified by Javascript.




回答2:


I'm revising this answer because there are now several projects that do a really good job of this:

  • PhantomJS is a headless version of WebKit, and there are some helpful wrappers such as CasperJS.

  • Zombie.js which is a wrapper over jsdom written in Javascript (Node.js).

You need to write JavaScript code to interact with both of these projects. I like Zombie.js better so far, since it is easier to set up, and you can use any Node.js/npm modules in your code.


Old answer:

No, there's no way to do that. You'd have to emulate a full browser environment inside PHP. I don't know of anyone who is doing this kind of scraping except Google, and it's far from comprehensive.

Instead, you should use Firebug or another web debugging tool to find the request (or sequence of requests) that generates the data you're actually interested in. Then, use PHP to perform only the needed request(s).



来源:https://stackoverflow.com/questions/11214122/php-file-get-contents-after-javascript-executes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!