java httpurlconnection cutting off html

余生颓废 提交于 2019-12-07 03:05:40

问题


Hey, I'm trying to get the html from a twitter profile page, but httpurlconnection is only returning a small snippet of the html. My code

for(int i = 0; i < urls.size(); i++)
{
URL url = new URL(urls.get(i));
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("User-Agent","Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6");
System.out.println(connection.getResponseCode());
String line;
StringBuilder builder = new StringBuilder();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
while((line = reader.readLine()) != null)
{
    builder.append(line);
}
String html = builder.toString();
}

I always get 200 as the response code for each call. However about 1/3 of the time the entire html document is returned, and the other half only the first few hundred lines. The amount returned when the html is cutoff is not always the same.

Any ideas? Thanks for any help!

Additional Info: After viewing the headers it seems I'm getting duplicate content-length headers. The first is the full length, the other is much shorter (and probably representative of the length I'm getting some of the time) How can I handle duplicate headers?


回答1:


This worked fine for me, I added a newline after builder.append(line); to make it more readable in the console, but other than that it returned all the HTML for this page:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

public class RetrieveHTML {

    public static void main(String[] args) throws IOException {
        List<String> urls = new ArrayList<String>();
        urls.add("http://stackoverflow.com/questions/3285077/java-httpurlconnection-cutting-off-html");

        for (int i = 0; i < urls.size(); i++) {
            URL url = new URL(urls.get(i));
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6");
            System.out.println(connection.getResponseCode());
            String line;
            StringBuilder builder = new StringBuilder();
            BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
            while ((line = reader.readLine()) != null) {
                builder.append(line);
                builder.append("\n"); 
            }
            String html = builder.toString();
            System.out.println("HTML " + html);
        }

    }
}



回答2:


Check out my HTTP class

https://stackoverflow.com/questions/9349378/java-net-httpurlconnection-returning-your-browsers-cookie-functionality-has-be

based on this API. Feel free to change some stuff.



来源:https://stackoverflow.com/questions/3285077/java-httpurlconnection-cutting-off-html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!