Can't read in HTML content from valid URL

馋奶兔 提交于 2020-01-15 09:39:07

问题


I am trying out a simple program for reading the HTML content from a given URL. The URL I am trying in this case doesn't require any cookie/username/password, but still I am getting a io.IOException: Server returned HTTP response code: 403 error. Can anyone tell me what am I doing wrong here? (I know there are similar question in SO, but they didn't help):

    import java.net.*;
import java.io.*;
import java.net.MalformedURLException;
import java.io.IOException;
public class urlcont {
public static void main(String[] args) {
try {
  URL u = new URL("http://www.amnesty.org/");
  URLConnection uc = u.openConnection();
  uc.addRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
  uc.connect();
  InputStream in = uc.getInputStream();
  int b;
  File f = new File("C:\\Users\\kausta\\Desktop\\urlcont.txt");
  f.createNewFile();
  OutputStream s = new FileOutputStream(f);
  while ((b = in.read()) != -1) {
    s.write(b);
  }
}
catch (MalformedURLException e) {System.err.println(e);}
catch (IOException e) {System.err.println(e);} 
}
}

回答1:


If you can fetch the URL in a browser, but not via Java, that indicates, to me, that they are blocking programmatic access to the page via user-agent filtering. Try setting the user-agent on your connection so that your code appears, to the webserver, to be a web-browser.

See this thread for help on that: What is the proper way of setting headers in a URLConnection?




回答2:


There is a permission problem:

A web server may return a 403 Forbidden HTTP status code in response to a request from a client for a web page or resource to indicate that the server refuses to allow the requested action




回答3:


you are not doing anything "wrong", the server you are trying to access is blocking your request, as you are not allowed to access the file

Http-Error 403 means Forbidden --> the remote server blocks the request.

check if you need to give authentification to access the document you want and in that case provide it with the request ;)



来源:https://stackoverflow.com/questions/14280464/cant-read-in-html-content-from-valid-url

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!