问题
I am trying out a simple program for reading the HTML content from a given URL. The URL I am trying in this case doesn't require any cookie/username/password, but still I am getting a io.IOException: Server returned HTTP response code: 403 error. Can anyone tell me what am I doing wrong here? (I know there are similar question in SO, but they didn't help):
import java.net.*;
import java.io.*;
import java.net.MalformedURLException;
import java.io.IOException;
public class urlcont {
public static void main(String[] args) {
try {
URL u = new URL("http://www.amnesty.org/");
URLConnection uc = u.openConnection();
uc.addRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
uc.connect();
InputStream in = uc.getInputStream();
int b;
File f = new File("C:\\Users\\kausta\\Desktop\\urlcont.txt");
f.createNewFile();
OutputStream s = new FileOutputStream(f);
while ((b = in.read()) != -1) {
s.write(b);
}
}
catch (MalformedURLException e) {System.err.println(e);}
catch (IOException e) {System.err.println(e);}
}
}
回答1:
If you can fetch the URL in a browser, but not via Java, that indicates, to me, that they are blocking programmatic access to the page via user-agent filtering. Try setting the user-agent on your connection so that your code appears, to the webserver, to be a web-browser.
See this thread for help on that: What is the proper way of setting headers in a URLConnection?
回答2:
There is a permission problem:
A web server may return a 403 Forbidden HTTP status code in response to a request from a client for a web page or resource to indicate that the server refuses to allow the requested action
回答3:
you are not doing anything "wrong", the server you are trying to access is blocking your request, as you are not allowed to access the file
Http-Error 403 means Forbidden --> the remote server blocks the request.
check if you need to give authentification to access the document you want and in that case provide it with the request ;)
来源:https://stackoverflow.com/questions/14280464/cant-read-in-html-content-from-valid-url