问题
I want to get the html source code of https://www2.cslb.ca.gov/OnlineServices/CheckLicenseII/LicenseDetail.aspx?LicNum=872423
and for that I am using this method but I am not getting the html source code.
public static String getHTML(URL url) {
HttpURLConnection conn; // The actual connection to the web page
BufferedReader rd; // Used to read results from the web page
String line; // An individual line of the web page HTML
String result = ""; // A long string containing all the HTML
try {
conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
while ((line = rd.readLine()) != null) {
result += line;
}
rd.close();
} catch (Exception e) {
e.printStackTrace();
}
return result;
}
回答1:
The server filters out Java's default User-Agent
. This works:
public static String getHTML(URL url) {
try {
final URLConnection urlConnection = url.openConnection();
urlConnection.addRequestProperty("User-Agent", "Foo?");
final InputStream inputStream = urlConnection.getInputStream();
final String html = IOUtils.toString(inputStream);
inputStream.close();
return html;
} catch (Exception e) {
throw new RuntimeException(e);
}
Looks like the user agents are black listed. By default my JDK sends:
User-Agent: Java/1.6.0_26
Note that I'm using IOUtils class to simplify example, but the key things is:
urlConnection.addRequestProperty("User-Agent", "Foo?");
来源:https://stackoverflow.com/questions/8142039/java-not-getting-html-code-from-a-url