Basically browsers are made to avoid doing this…
The solution everyone thinks about first:
jQuery/JavaScript: accessing contents of an iframe
But it will not work in most cases with "recent" browsers (<10 years old)
Alternatives are:
- Using the official apis of the server (if any)
- Try finding if the server is providing a JSONP service (good luck)
- Being on the same domain, try a cross site scripting (if possible, not very ethical)
- Using a trusted relay or proxy (but this will still use your own ip)
- Pretends you are a google web crawler (why not, but not very reliable and no warranties about it)
- Use a hack to setup the relay / proxy on the client itself I can think about java or possibly flash. (will not work on most mobile devices, slow, and flash does have its own cross site limitations too)
- Ask google or another search engine for getting the content (you might have then a problem with the search engine if you abuse of it…)
- Just do this job by yourself and cache the answer, this in order to unload their server and decrease the risk of being banned.
- Index the site by yourself (your own web crawler), then use your own indexed website. (depends on the source changes frequency)
http://www.quora.com/How-can-I-build-a-web-crawler-from-scratch
[EDIT]
One more solution I can think about is using going through a YQL service, in this manner it is a bit like using a search engine / a public proxy as a bridge to retrieve the informations for you.
Here is a simple example to do so, In short, you get cross domain GET requests