Jsoup selector on RSS <link> tag returns empty string with .text() method

后端 未结 2 1176
孤街浪徒
孤街浪徒 2020-12-21 06:20

I\'m using jsoup to parse an rss feed using java. I\'m having problems getting a result when trying to select the first element in the document.

相关标签:
2条回答
  • 2020-12-21 06:37

    Refer here. Jsoup added this XmlParser.

    try {
        String xml = "<rss></rss><channel></channel><link>http://www.the.blog/category</link><title>The Blog Title</title>";
        Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
    
        Element title = doc.select("title").first();
        System.out.println(title.text());
    
        Element link = doc.select("link").first();
        System.out.println(link.text());
    } catch (Exception e) {
        e.printStackTrace();
    }
    
    0 讨论(0)
  • 2020-12-21 06:57

    Your rss feed is XML, not HTML. For this to work, you must tell JSoup to use its XMLParser. This will work:

    String rss = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
      +"<rss><channel>"
      +  "<title>The Blog Title</title>"
      +  "<link>http://www.the.blog/category</link>"
      +"</channel></rss>";
    
    Document doc = Jsoup.parse(rss, "", Parser.xmlParser());
    
    Element link = doc.select("rss channel link").first();
    System.out.println(link.text()); // prints empty string
    

    Explanation:

    The link tag in HTML follows a different format and Jsoup tries to interpret the <link> of your rss as such html tag.

    0 讨论(0)
提交回复
热议问题