I am trying to get the original source for a particular web page.
The page executes some scripts that modify the DOM as soon as it loads. I would like to get the sou
As Fanch pointed out, it seems it's not possible to do this. If you are able to do two requests, then this gets easy. Simply do one request with JavaScript enabled and one without, so you can scrape the page source and compare it.
casper
.then(function(){
this.options.pageSettings.javascriptEnabled = false;
})
.thenOpen(url, function(){
this.echo("before JavaScript");
this.echo(this.getHTML());
})
.then(function(){
this.options.pageSettings.javascriptEnabled = true;
})
.thenOpen(url, function(){
this.echo("before JavaScript");
this.echo(this.getHTML());
});
You can change the order according to your needs. If you're already on a page that you want to have the original markup of, then you can use casper.getCurrentUrl() to get the current URL:
casper
.then(function(){
// submit or whatever
})
.thenOpen(url, function(){
this.echo("after JavaScript");
this.echo(this.getHTML());
this.options.pageSettings.javascriptEnabled = false;
this.thenOpen(this.getCurrentUrl(), function(){
this.echo("before JavaScript");
this.echo(this.getHTML());
})
});