问题
I'm reading large documents from which I only need top 5%, can I do the following with HttpClient 4?
- Request the page (get or post)
- Read response as a stream
- Feed it into SAX-based HTML parser "on the fly"
- When certain HTML tag is detected - terminate the stream
Please note that HttpClient v. 4 is required - I cannot use v. 3
回答1:
Thanks to Ken from HttpClient mail list here's the answer
Use the HttpEntity#getContent() method, which returns an
java.io.InputStream, and pass that to your SAX-based HTML parser.http://hc.apache.org/httpcomponents-client/tutorial/html/fundamentals.html#d4e122
When you see the tag you need, terminate the request via invoking the HttpUriRequest#abort() method.
http://hc.apache.org/httpcomponents-client/tutorial/html/fundamentals.html#d4e285
来源:https://stackoverflow.com/questions/1289629/reading-and-terminating-stream-in-httpclient-4