Android gets HTTPS page truncated

て烟熏妆下的殇ゞ 提交于 2019-12-24 05:51:53

问题


I am fetching a web page on Android using HTTPS (ignoring the certificate as it is both self-signed and outdated, as seen here - don't ask, it's not my server :)).

I've defined my

public class MyHttpClient extends DefaultHttpClient {


    public MyHttpClient() {
        super();
        final HttpParams params = getParams();
        HttpConnectionParams.setConnectionTimeout(params,
                REGISTRATION_TIMEOUT);
        HttpConnectionParams.setSoTimeout(params, REGISTRATION_TIMEOUT);
        ConnManagerParams.setTimeout(params, REGISTRATION_TIMEOUT);
    }

    @Override
    protected ClientConnectionManager createClientConnectionManager() {
        SchemeRegistry registry = new SchemeRegistry();
        registry.register(new Scheme("http", PlainSocketFactory
                .getSocketFactory(), 80));
        registry.register(new Scheme("https", new UnsecureSSLSocketFactory(), 443));
        return new SingleClientConnManager(getParams(), registry);
    }
}

where the UnsecureSSLSocketFactory mentioned is based on the suggestion given on the aforementioned topic.

I'm then using this class to fecth a page

public class HTTPHelper {

    private final static String TAG = "HTTPHelper";
    private final static String CHARSET = "ISO-8859-1";

    public static final String USER_AGENT = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 (.NET CLR 3.5.30729)";
    public static final String ACCEPT_CHARSET = "ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    public static final String ACCEPT = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";


    /**
     * Sends an HTTP request
     * @param url
     * @param post
     * @return
     */
    public String sendRequest(String url, String post) throws ConnectionException {

        MyHttpClient httpclient = new MyHttpClient();

        HttpGet httpget = new HttpGet(url);
        httpget.addHeader("User-Agent", USER_AGENT);
        httpget.addHeader("Accept", ACCEPT);
        httpget.addHeader("Accept-Charset", ACCEPT_CHARSET);

        HttpResponse response;
        try {
            response = httpclient.execute(httpget);
        } catch (Exception e) {
            throw new ConnectionException(e.getMessage());
        }

        HttpEntity entity = response.getEntity();

        try {
            pageSource = convertStreamToString(entity.getContent());
        } catch (Exception e) {
            throw new ConnectionException(e.getMessage());
        }
        finally {
            if (entity != null) {
                try {
                    entity.consumeContent();
                } catch (IOException e) {
                    throw new ConnectionException(e.getMessage());
                }
            }
        }

        httpclient.getConnectionManager().shutdown();
        return pageSource;

    }

    /**
     * Converts a stream to a string
     * @param is
     * @return
     */
    private static String convertStreamToString(InputStream is) 
    {
        try {
            BufferedReader reader = new BufferedReader(new InputStreamReader(is, CHARSET));
            StringBuilder stringBuilder = new StringBuilder();
            String line = null;
            try {
                while ((line = reader.readLine()) != null) {
                    stringBuilder.append(line + "\n");
                }
            } catch (IOException e) {
                Log.d(TAG, "Exception in convertStreamToString", e);
            } finally {
                try {
                    is.close();
                } catch (IOException e) {}
            }
            return stringBuilder.toString();
        } catch (Exception e) {
            throw new Error("Unsupported charset");
        }
    }

}

The page I get is truncated after about a hundred of lines. It's truncated at a precise point, where a '_' (underscore) char is followed by a 'r' char. It's not the first underscore in the page.

I thought it might have been an encoding issue, so I tried both UTF-8 and ISO-8859-1, but it's still truncated. If I open the page with Firefox, it reports the encoding being ISO-8851-1.

In case you are wondering, the webpage is https://ricarichiamoci.dsu.pisa.it/ and it gets truncated at line 169,

function ChangeOffset(NewOffset) {
  document.mainForm.last

where it should instead be

function ChangeOffset(NewOffset) {
  document.mainForm.last_record.value = NewOffset;

Does anyone have an idea of why the page is truncated?


回答1:


I figured out the page downloaded is not truncated, but the function I'm using to print it out (Log.d) does truncate the string.

So the method to download the page source code is working fine, but Log.d() is probably not meant to print that much amount of text.



来源:https://stackoverflow.com/questions/4259884/android-gets-https-page-truncated

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!