Reading and terminating stream in HttpClient 4

你。 提交于 2019-12-08 09:06:59

问题


I'm reading large documents from which I only need top 5%, can I do the following with HttpClient 4?

  1. Request the page (get or post)
  2. Read response as a stream
  3. Feed it into SAX-based HTML parser "on the fly"
  4. When certain HTML tag is detected - terminate the stream

Please note that HttpClient v. 4 is required - I cannot use v. 3


回答1:


Thanks to Ken from HttpClient mail list here's the answer

Use the HttpEntity#getContent() method, which returns an
java.io.InputStream, and pass that to your SAX-based HTML parser.

http://hc.apache.org/httpcomponents-client/tutorial/html/fundamentals.html#d4e122

When you see the tag you need, terminate the request via invoking the HttpUriRequest#abort() method.

http://hc.apache.org/httpcomponents-client/tutorial/html/fundamentals.html#d4e285



来源:https://stackoverflow.com/questions/1289629/reading-and-terminating-stream-in-httpclient-4

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!