Open webpage and parse it using JavaScript

拜拜、爱过 提交于 2019-11-28 04:09:17

You can use an XMLHttpRequest object to do this. Here's a simple example

var req = new XMLHttpRequest();  
req.open('GET', 'http://www.mydomain.com/', false);   
req.send(null);  
if(req.status == 200)  
   dump(req.responseText);

Once loaded, you can perform your parsing/scraping by using javascript regular expressions on the req.responseText member.

More detail...

In practice you need to do a little more to get the XMLHttpRequest object in a cross platform manner, e.g.:

var ua = navigator.userAgent.toLowerCase();
if (!window.ActiveXObject)
  req = new XMLHttpRequest();
else if (ua.indexOf('msie 5') == -1)
  req = new ActiveXObject("Msxml2.XMLHTTP");
else
  req = new ActiveXObject("Microsoft.XMLHTTP");

Or use a library...

Alternatively, you can save yourself all the bother and just use a library like jQuery or Prototype to take care of this for you.

Same-origin policy may bite you though...

Note that due to the same-origin policy, the page you request must be from the same domain as the page making the request. If you want to request a remote page, you will have to proxy that via a server side script.

Another possible workaround is to use Flash to make the request, which does allow cross-domain requests if the target site grants permission with a suitably configured crossdomain.xml file.

Here's a nice article on the subject of the same-origin policy:

Whatever Origin is an open source library that allows you to use purely Javascript to do scraping. It also solves the "same-domain-origin" problem. http://www.whateverorigin.org/

$.getJSON('http://whateverorigin.org/get?url=' + encodeURIComponent('http://google.com') + '&callback=?', function(data){
    alert(data.contents);
});

You would use AJAX. This would make a Get request to the URL in question and return the response HTML. Jquery makes this very easy e.g.

$.get("test.php");

http://docs.jquery.com/Ajax

Andrew

You could open the new window in an iframe:

http://www.w3schools.com/TAGS/tag_iframe.asp

Although note that Javascript access is limited if the site you open is from a different URL. This is to prevent cross-site scripting attacks:

http://en.wikipedia.org/wiki/Cross-site_scripting

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!