问题
I have a requirement to Parse remote html page ( ex: www.mywesite.com/home) how can i get this website html page source and how can i parse this page
that html is like this
<html>
<body>
<div class="my-class1">
<a href="home/link?id=1">hello</a>
</div>
<div class="my-class1">
<a href="home/link?id=2">hey</a>
</div>
<div class="my-class1">
<a href="home/link?id=3">bye</a>
</div>
</body>
</html>
i want output as
hello
hey
bye
I'm not using any server side technology(like java, .net) i want to achieve this using java script only
is it possible to parse remote html page using Pure javaScript or any other jQuery plugin
thanks in advance
回答1:
Ordinary browser javascript cannot access the contents of remote pages from any server except its own.
You can:
Have a cooperating script on your own server to fetch the remote content
With the cooperation of the remote server, you may be able to access content with an appropriate CORS ( http://en.wikipedia.org/wiki/Cross-origin_resource_sharing ) arrangement.
Again with the cooperation of the remote server, if it makes its content available by javascript you can access that by creating inline script elements. "JSONP" is an example of this approach.
If you write a browser plugin or addon - for browsers which permit such things to be written in javascript - then you are not bound by the browser security model in the same way.
回答2:
assuming origin
fixed etc, here is the approach I use:
// get body part of html txt = txt.substr( txt.indexOf('<body>')+6 ); txt = txt.substr( 0, txt.indexof('</body>')-1 ); // stick body into div var div = document.createElement('div'); div.innerHTML = txt; // extract textContent from each element (or something more interesting) Array.prototype.slice( div.querySelectorAll('*') ).forEach( function(el) { if( el.textContent ) console.log( el.textContent ); });
来源:https://stackoverflow.com/questions/15812057/how-can-i-parse-remote-html-page-using-pure-java-script