So, it seem that the main concern is being DRY
- If you're using pushState have your server send the same exact code for all urls (that don't contain a file extension to serve images, etc.) "/mydir/myfile", "/myotherdir/myotherfile" or root "/" -- all requests receive the same exact code. You need to have some kind url rewrite engine. You can also serve a tiny bit of html and the rest can come from your CDN (using require.js to manage dependencies -- see https://stackoverflow.com/a/13813102/1595913).
- (test the link's validity by converting the link to your url scheme and testing against existence of content by querying a static or a dynamic source. if it's not valid send a 404 response.)
- When the request is not from a google bot, you just process normally.
- If the request is from a google bot, you use phantom.js -- headless webkit browser ("A headless browser is simply a full-featured web browser with no visual interface.") to render html and javascript on the server and send the google bot the resulting html. As the bot parses the html it can hit your other "pushState" links /somepage on the server
mylink, the server rewrites url to your application file, loads it in phantom.js and the resulting html is sent to the bot, and so on...
- For your html I'm assuming you're using normal links with some kind of hijacking (e.g. using with backbone.js https://stackoverflow.com/a/9331734/1595913)
- To avoid confusion with any links separate your api code that serves json into a separate subdomain, e.g. api.mysite.com
- To improve performance you can pre-process your site pages for search engines ahead of time during off hours by creating static versions of the pages using the same mechanism with phantom.js and consequently serve the static pages to google bots. Preprocessing can be done with some simple app that can parse
tags. In this case handling 404 is easier since you can simply check for the existence of the static file with a name that contains url path.
- If you use #! hash bang syntax for your site links a similar scenario applies, except that the rewrite url server engine would look out for _escaped_fragment_ in the url and would format the url to your url scheme.
- There are a couple of integrations of node.js with phantom.js on github and you can use node.js as the web server to produce html output.
Here are a couple of examples using phantom.js for seo:
http://backbonetutorials.com/seo-for-single-page-apps/
http://thedigitalself.com/blog/seo-and-javascript-with-phantomjs-server-side-rendering