So I'm doing some screen scraping on a site that is very JS heavy. It uses a client side templating engine that renders all the content. I tried using jQuery and that worked in the console, but not on the server (Nodejs), obviously.
I looked at a few libraries for Python and Java, and they seem to be able to handle what I want, but I would prefer a JS solution that works with a Node server.
Is there any way to get the complete source of a page after it's rendered, using Node?
I used jsdom for screen scrapping and the code goes here...
var jsdom = require( 'jsdom' );
jsdom.env( {
url: <give_url_of_page_u_want_to_scarpe>,
scripts: [ "http://code.jquery.com/jquery.js" ],
done: function( error, window ) {
var $ = window.$;
// required page is loaded in $....
//you can write any javascript or jquery code get what ever you want
}
} );
if you want to use a nodejs module then you might be interested in this:
https://github.com/sgentle/phantomjs-node
or this:
来源:https://stackoverflow.com/questions/24109469/getting-source-of-a-page-after-its-rendered-in-a-templating-engine