How do I retrieve a URL from a web site using Java?

故事扮演 提交于 2019-12-17 10:38:08

问题


I want to use HTTP GET and POST commands to retrieve URLs from a website and parse the HTML. How do I do this?


回答1:


You can use HttpURLConnection in combination with URL.

URL url = new URL("http://example.com");
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("GET");
connection.connect();

InputStream stream = connection.getInputStream();
// read the contents using an InputStreamReader



回答2:


The ticked/approved answer for this is from robhruska - thank you. This shows the most basic way to do it, it's simple with an understanding of what's necessary to do a simple URL connection. However, the longer term strategy would be to use HTTP Client for more advanced and feature rich ways to complete this task.

Thank you everyone, here's the quick answer again:

URL url = new URL("http://example.com");
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("GET");
connection.connect();

InputStream stream = connection.getInputStream();
// read the contents using an InputStreamReader



回答3:


The easiest way to do a GET is to use the built in java.net.URL. However, as mentioned, httpclient is the proper way to go, as it will allow you among others to handle redirects.

For parsing the html, you can use html parser.




回答4:


Use http://hc.apache.org/httpclient-3.x/




回答5:


I have used JTidy in a project and it worked quite well. A list of other parsers is here, but besides from JTidy I don't know any of them.



来源:https://stackoverflow.com/questions/359439/how-do-i-retrieve-a-url-from-a-web-site-using-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!