问题
What
My web-app is made dynamic through Google's AngularJS.
I want static versions of my pages to be generated.
Why
Web-scrapers like Google's execute and render the JavaScript; but don't treat the content the same way as their static equivalents.
References:
- Does heavy JavaScript use adversely impact Googleability? (Programmers StackExchange)
- Making AJAX Applications Crawlable (Google Documentation for webmasters)
How
Not sure exactly how—which is why I'm asking—but I want to access the same source that your browser's 'inspect element' presents; rather than the source that: Ctrl+U (View page source) shows.
Once I have a script which renders the page; 'spitting' out the HTML+CSS; I will place those 'generated' files on my web-server. A 'cron' job will then be scheduled to regenerate the files at regular intervals.
These static files will subsequently be served instead of the dynamic ones; when JavaScript is disabled and/or when a scraper 'visits' the site.
回答1:
Here is one solution, however I very much doubt I'll be able to find a public PaaS cloud which can run it:
import spynner
if __name__=='__main__':
url = "http://angular.github.com/angular-phonecat/step-10/app/#/phones"
browser = spynner.Browser()
browser.create_webview(True)
browser.load(url, load_timeout=60)
print browser._get_html()
# ^ Can pipe this to a file, POST it to my server or return it as a string
browser.close()
Package: Spynner (on Github)
来源:https://stackoverflow.com/questions/14079416/render-javascript-to-html-in-python