Open webpage and parse it using JavaScript

时光毁灭记忆、已成空白 提交于 2019-11-27 05:17:15

问题


I know JavaScript can open a link in a new window but is it possible to open a webpage without opening it in a window or displaying it to the user? What I want to do is parse that webpage for some text and use it as variables.

Is this possible without any help from server side languages? If so, please send me in a direction I can achieve this.

Thanks all


回答1:


You can use an XMLHttpRequest object to do this. Here's a simple example

var req = new XMLHttpRequest();  
req.open('GET', 'http://www.mydomain.com/', false);   
req.send(null);  
if(req.status == 200)  
   dump(req.responseText);

Once loaded, you can perform your parsing/scraping by using javascript regular expressions on the req.responseText member.

More detail...

In practice you need to do a little more to get the XMLHttpRequest object in a cross platform manner, e.g.:

var ua = navigator.userAgent.toLowerCase();
if (!window.ActiveXObject)
  req = new XMLHttpRequest();
else if (ua.indexOf('msie 5') == -1)
  req = new ActiveXObject("Msxml2.XMLHTTP");
else
  req = new ActiveXObject("Microsoft.XMLHTTP");

Or use a library...

Alternatively, you can save yourself all the bother and just use a library like jQuery or Prototype to take care of this for you.

Same-origin policy may bite you though...

Note that due to the same-origin policy, the page you request must be from the same domain as the page making the request. If you want to request a remote page, you will have to proxy that via a server side script.

Another possible workaround is to use Flash to make the request, which does allow cross-domain requests if the target site grants permission with a suitably configured crossdomain.xml file.

Here's a nice article on the subject of the same-origin policy:

  • Same-Origin Policy Part 1: Why we’re stuck with things like XSS and XSRF/CSRF



回答2:


Whatever Origin is an open source library that allows you to use purely Javascript to do scraping. It also solves the "same-domain-origin" problem. http://www.whateverorigin.org/

$.getJSON('http://whateverorigin.org/get?url=' + encodeURIComponent('http://google.com') + '&callback=?', function(data){
    alert(data.contents);
});



回答3:


You could open the new window in an iframe:

http://www.w3schools.com/TAGS/tag_iframe.asp

Although note that Javascript access is limited if the site you open is from a different URL. This is to prevent cross-site scripting attacks:

http://en.wikipedia.org/wiki/Cross-site_scripting




回答4:


You would use AJAX. This would make a Get request to the URL in question and return the response HTML. Jquery makes this very easy e.g.

$.get("test.php");

http://docs.jquery.com/Ajax

Andrew




回答5:


You can try using fetch and it's callback

fetch('https://api.codetabs.com/v1/proxy?quest=google.com').then((response) => response.text()).then((text) => console.log(text));


来源:https://stackoverflow.com/questions/597907/open-webpage-and-parse-it-using-javascript

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!