How can i parse remote html page using pure java script

ぃ、小莉子 提交于 2019-12-13 14:27:01

问题


I have a requirement to Parse remote html page ( ex: www.mywesite.com/home) how can i get this website html page source and how can i parse this page

that html is like this

 <html>
     <body>
        <div class="my-class1">
             <a href="home/link?id=1">hello</a>
        </div>

        <div class="my-class1">
             <a href="home/link?id=2">hey</a>
        </div>

        <div class="my-class1">
             <a href="home/link?id=3">bye</a>
        </div>
     </body>
 </html>

i want output as

 hello
 hey
 bye 

I'm not using any server side technology(like java, .net) i want to achieve this using java script only

is it possible to parse remote html page using Pure javaScript or any other jQuery plugin

thanks in advance


回答1:


Ordinary browser javascript cannot access the contents of remote pages from any server except its own.

You can:

  1. Have a cooperating script on your own server to fetch the remote content

  2. With the cooperation of the remote server, you may be able to access content with an appropriate CORS ( http://en.wikipedia.org/wiki/Cross-origin_resource_sharing ) arrangement.

  3. Again with the cooperation of the remote server, if it makes its content available by javascript you can access that by creating inline script elements. "JSONP" is an example of this approach.

  4. If you write a browser plugin or addon - for browsers which permit such things to be written in javascript - then you are not bound by the browser security model in the same way.




回答2:


assuming origin fixed etc, here is the approach I use:

// get body part of html
txt = txt.substr( txt.indexOf('<body>')+6 );
txt = txt.substr( 0, txt.indexof('&lt/body&gt')-1 );

// stick body into div
var div = document.createElement('div');
div.innerHTML = txt;

// extract textContent from each element (or something more interesting)
Array.prototype.slice( div.querySelectorAll('*') ).forEach( function(el) {
   if( el.textContent ) console.log( el.textContent );
});


来源:https://stackoverflow.com/questions/15812057/how-can-i-parse-remote-html-page-using-pure-java-script

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!