问题
I'm in front of a very big problem to me.. I'm parsing this page http://multiplayer.it/articoli/ with inside some articles.. As you can see, there are some informations i can parse: Tile, date of the article, comments and little preview of the article.
THE GOAL :
My goal is click on the article i parse(this operation it's already ok, i have the list with the informations i wrote below) and onClick i want enter in the article itself to see the content. Example: if i click in the first article right now, it brings me at this URL: http://multiplayer.it/notizie/127771-peter-moore-getta-acqua-sul-fuoco-e-descrive-nintendo-come-un-grande-partner-per-ea.html with all content i need view. The appplication has to do the same.
THE PROBLEM I don't know how can do it. But parsing the url of each post i can know the absolute path of post. I can parse it in this way:
try {
Document doc = Jsoup.connect(BLOG_URL).get();
Elements links = doc.select("div.col-1-1 h2 a[href]");
for(Element sezione : links)
{
Log.d("Links", sezione.attr("abs:href"));
}
} catch (Exception e) {
Log.e("ERROR", "Parsing Error");
}
And it returns each href.
QUESTION
Is it possible knwoing the href parse each page content? (the 'p' tag) Thanks
OnClick method
lista.setOnItemClickListener(new OnItemClickListener() {
@Override
public void onItemClick(AdapterView<?> parent, View view,
int position, long id) {
//What here?
}
});
回答1:
jsoup wouldn't handle your dynamic actions on a web page. You would need to use an API which can handle these dynamic executions - an example being HtmlUnit.
Let's say you have a possibility all the links stored as part of a Java Collection instance like an ArrayList. If I try to parse the first url in the form of a specific method (which can be looped over to get the contents at runtime for all the url on your page dynamically):
Using HtmlUnit
public static void main(String... args)
throws FailingHttpStatusCodeException, IOException {
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_17);
WebRequest request = new WebRequest(
new URL(
"http://multiplayer.it/articoli/"));
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.setJavaScriptTimeout(10000);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setTimeout(10000);
HtmlPage page = webClient.getPage(request);
webClient.waitForBackgroundJavaScript(10000);
System.out.println("Current page: Articoli videogiochi - Multiplayer.it");
// Current page:
// Title=Articoli videogiochi - Multiplayer.it
// URL=http://multiplayer.it/articoli/
List<HtmlAnchor> anchors1 = page.getAnchors();
HtmlAnchor link2 = null;
for(HtmlAnchor anchor: anchors1)
{
if(anchor.asText().indexOf("Dead Rising 3: Operation Broken Eagle") > -1 )
{
link2 = anchor;
break;
}
}
page = link2.click();
System.out.println("Current page: Dead Rising 3: Operation Broken Eagle - Recensione - Xbox On...");
// Current page:
// Title=Dead Rising 3: Operation Broken Eagle - Recensione - Xbox On...
// URL=http://multiplayer.it/recensioni/127745-dead-rising-3-operation-broken-eagle-una-delle-storie-di-los-perdidos.html
webClient.waitForBackgroundJavaScript(10000);
DomNodeList<DomElement> paras = page.getElementsByTagName("p");
for (DomElement el : paras.toArray(new DomElement[paras.size()])) {
System.out.println(el.asText());
}
}
In the above code, it displays all the <p> available on the landing page. Below is the screenshot of the output:
In the above code block, you have the ability to loop over all the anchor tags on the web page, and I choose a specific anchor link to get the resulting content:
List<HtmlAnchor> anchors1 = page.getAnchors();
HtmlAnchor link2 = null;
for(HtmlAnchor anchor: anchors1)
{
if(anchor.asText().indexOf("Dead Rising 3: Operation Broken Eagle") > -1 )
{
link2 = anchor;
break;
}
}
You might want to right an appropriate logic to parse all the dynamic links on your page and display their contents.
EDIT:
You can try generating these dynamic scripts through htmlunitscripter Firefox plugin and customize it later to your needs too.
来源:https://stackoverflow.com/questions/21330020/jsoup-parsing-page-knowing-url