Parse the inner html tags using jSoup

馋奶兔 提交于 2020-01-17 11:35:15

问题


I want to find the important links in a site using Jsoup library. So for this suppose we have following code:

<h1><a href="http://example.com">This is important </a></h1>

Now while parsing how can we find that the tag a is inside the h1 tag?


回答1:


You can do it this way:

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

Elements headlinesCat1 = doc.getElementsByTag("h1");
for (Element headline : headlinesCat1) {
    Elements importantLinks = headline.getElementsByTag("a");
    for (Element link : importantLinks) {
        String linkHref = link.attr("href");
        String linkText = link.text();
        System.out.println(linkHref);
    }
}

Taken from the JSoup Cookbook.




回答2:


Use selector:

Elements elements = doc.select("h1 > a");


来源:https://stackoverflow.com/questions/30754778/parse-the-inner-html-tags-using-jsoup

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!