Using java to extract a single value from an html page:

前提是你 提交于 2019-12-13 04:16:21

问题


I am continuing work on a project that I've been at for some time now, and I have been struggling to pull some data from a website. The website has an iframe that pulls in some data from an unknown source. The data is in the iframe in a tag something like this:

<DIV id="number_forecast"><LABEL id="lblDay">9,000</LABEL></DIV>

There is a BUNCH of other crap above it but this div id / label is totally unique and is not used anywhere else in the code.


回答1:


jsoup is probably what you want, it excels at extracting data from an HTML document.

There are many examples available showing how to use the API: http://jsoup.org/cookbook/extracting-data/selector-syntax

The process will be in two steps:

  • parse the page and find the url of the iframe
  • parse the content of the iframe and extract the information you need

The code would look like this:

 // let's find the iframe
 Document document = Jsoup.parse(inputstream, "iso-8859-1", url);
 Elements elements = document.select("iframe");
 Element iframe = elements.first();

 // now load the iframe
 URL iframeUrl = new URL(iframe.absUrl("src"));
 document = Jsoup.parse(iframeUrl, 15000);

 // extract the div
 Element div = document.getElementById("number_forecast");



回答2:


In you page that contains iframe change source of youe iframe to your own url. This url will be processed with your ouw controller, that will read content, parse it, extract all you need and write to response. If there is absolute references in your iframe this should work.



来源:https://stackoverflow.com/questions/10817882/using-java-to-extract-a-single-value-from-an-html-page

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!