Google App Engine ( Java ) : URL Fetch Response too large problems

青春壹個敷衍的年華 提交于 2019-12-06 10:44:39

问题


I'm trying to build some sort of webservice on google apps.

Now the problem is, I need to get data from a website (HTML Scraping).

The request looks like :

URL url = new URL(p_url);
con = (HttpURLConnection) url.openConnection();
InputStreamReader in = new InputStreamReader(con.getInputStream());
BufferedReader reader = new BufferedReader(in);

        String result = "";
        String line = "";
        while((line = reader.readLine()) != null)
        {
            System.out.println(line);
        }
        return result;

Now App Engine gives me the follwing exception at the 3th line:

com.google.appengine.api.urlfetch.ResponseTooLargeException

This is because the maximum request limit is at 1mb and the total HTML from the page is about 1.5mb.

Now my question: I only need the first 20 lines of the html to scrape. Is there a way to only get a part of the HTML so that the ResponseTooLargeException will not be thrown?

Thanks in advance!


回答1:


Solved the problem by using the low level URLFetch api.

And setting the allowtruncate option to true;

http://code.google.com/intl/nl-NL/appengine/docs/java/javadoc/com/google/appengine/api/urlfetch/FetchOptions.html

Basicly it works like this :

HTTPRequest request = new HTTPRequest(_url, HTTPMethod.POST, Builder.allowTruncate());
URLFetchService service = URLFetchServiceFactory.getURLFetchService();
HTTPResponse response = service.fetch(request);


来源:https://stackoverflow.com/questions/3996170/google-app-engine-java-url-fetch-response-too-large-problems

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!