步骤:
1. 设置url:HttpPost httpPost = new HttpPost(String url);
//当url带参数时使用 HttpGet httpget = new HttpGet(url);
2. 设置参数(使用HttpGet时无需设置):
List<NameValuePair> params = new ArrayList<NameValuePair>();
params.add(new BasicNameValuePair(String arg0, String arg0Value));
params.add......
httpPost.setEntity(new UrlEncodedFormEntity(params,"GB2312"));
3.执行请求:
HttpClient httpClient = new DefaultHttpClient();
HttpResponse rps0 = httpClient.execute(httpPost);
//可以利用返回码判断请求是否成功再在if内部实现下一步
int resStatu = responce.getStatusLine().getStatusCode();// 返回码
if (resStatu == HttpStatus.SC_OK) {
}
4.获取html:
HttpEntity entity0 = rps0.getEntity();
String html = EntityUtils.toString(entity0);
5.关闭连接:
httpClient.getConnectionManager().shutdown();
6.解析html:
Document doc = Jsoup.parse(html);
7.其他
如果拿到的html是乱码 要进行转码
Document doc = Jsoup.parse(html);
Element e = doc.getElementsByTag("meta").first();
if(e != null){
String content = "";
String charset = "";
if(e.attr("content") != null && e.attr("content") != ""){
content = e.attr("content");
charset = content.substring(content.indexOf("=")+1);
}
else if(e.attr("charset") != null && e.attr("charset") != "")charset = e.attr("charset");
else charset = "GBK";
System.out.println(charset);
text = new String(html.getBytes("ISO-8859-1"),charset);
//// System.out.println(content.substring(content.indexOf("=")+1));
//// System.out.println(new String(html.getBytes("ISO-8859-1"),content.substring(content.indexOf("=")+1)));
}
else
{text = new String(html.getBytes("ISO-8859-1"),"GBK");}//如果拿不到原页面的编码格式,默认为GBK
来源:https://www.cnblogs.com/linchuxin/archive/2012/03/24/2415510.html