问题
I'd like to fetch a webpage, just fetching the data (not parsing or rendering anything), just catch the data returned after a http request.
I'm trying to do this using the high-level Class Socket of the JavaRuntime Library.
I wonder if this is possible since I'm not at ease figuring out the beneath layer used for this two-point communication or I don't know if the trouble is coming from my own system.
.
Here's what my code is doing:
1) setting the socket.
this.socket = new Socket( "www.example.com", 80 );
2) setting the appropriate streams used for this communication.
this.out = new PrintWriter( socket.getOutputStream(), true);
this.in = new BufferedReader( new InputStreamReader( socket.getInputStream() ) );
3) requesting the page (and this is where I'm not sure it's alright to do like this).
String query = "";
query += "GET / HTTP/1.1\r\n";
query += "Host: www.example.com\r\n";
...
query += "\r\n";
this.out.print(query);
4) reading the result (nothing in my case).
System.out.print( this.in.readLine() );
5) closing socket and streams.
回答1:
If you're on a *nix system, look into CURL, which allows you to retrieve information off the internet using the command line. More lightweight than a Java socket connection.
If you want to use Java, and are just retrieving information from a webpage, check out the Java URL library (java.net.URL). Some sample Java code:
URL ur = new URL("www.google.com");
URLConnection conn = ur.openConnection();
InputStream is = conn.getInputStream();
String foo = new Scanner(is).useDelimiter("\\A").next();
System.out.println(foo);
That'll grab the specified URL, grab the data (html in this case) and spit it out to the console. Might have to tweak the delimiter abit, but this will work with most network endpoints sending data.
回答2:
Your code looks pretty close. Your GET request is probably malformed in some way. Try this: open up a telnet client and connect to a web server. Paste in the GET request as you believe it should work. See if that returns anything. If it doesn't it means there is a problem with the GET request. The easiest thing to do that point would be write a program that listens on a socket (more or less the inverse of what you're doing) and point a web browser to localhost:[correct port] and see what the web browser sends you. Use that as your template for the GET request.
Alternatively you could try and piece it together from the HTTP specification.
回答3:
I had to add the full URL to the GET parameter. To make it work. Although I see you can specify HOST also if you want.
Socket socket = new Socket("youtube.com",80);
PrintWriter out = new PrintWriter(new BufferedWriter(new
OutputStreamWriter(socket.getOutputStream())));
out.println("GET http://www.youtube.com/yts/img/favicon_48-vflVjB_Qk.png
HTTP/1.0");
out.println();
out.flush();
回答4:
Yes, it is possible. You just need to figure out the protocol. You are close.
I would create a simple server socket that prints out what it gets in. You can then use your browser to connect to the socket using a url like: http://localhost:8080. Then use your client socket to mimic the HTTP protocol from the browser.
回答5:
Not sure why you're going lower down than URLConnection
- its designed to do what you want to do: http://download.oracle.com/javase/tutorial/networking/urls/readingWriting.html.
The Java Tutorial on Sockets even says: "URLs and URLConnections provide a relatively high-level mechanism for accessing resources on the Internet. Sometimes your programs require lower-level network communication, for example, when you want to write a client-server application." Since you're not going lower than HTTP, I'm not sure what the point is of using a Socket.
来源:https://stackoverflow.com/questions/7500342/using-sockets-to-fetch-a-webpage-with-java