Convert xPath to JSoup query

前端 未结 6 1463
南旧
南旧 2020-12-09 10:29

Does anyone know of an xPath to JSoup convertor? I get the following xPath from Chrome:

 //*[@id=\"docs\"]/div[1]/h4/a

and would like to c

相关标签:
6条回答
  • 2020-12-09 10:48

    You don't necessarily need to convert Xpath to JSoup specific selectors.

    Instead you can use XSoup which is based on JSoup and supports Xpath.

    https://github.com/code4craft/xsoup

    Here is an example using XSoup from the docs.

    @Test
    public void testSelect() {
    
        String html = "<html><div><a href='https://github.com'>github.com</a></div>" +
                "<table><tr><td>a</td><td>b</td></tr></table></html>";
    
        Document document = Jsoup.parse(html);
    
        String result = Xsoup.compile("//a/@href").evaluate(document).get();
        Assert.assertEquals("https://github.com", result);
    
        List<String> list = Xsoup.compile("//tr/td/text()").evaluate(document).list();
        Assert.assertEquals("a", list.get(0));
        Assert.assertEquals("b", list.get(1));
    }
    
    0 讨论(0)
  • 2020-12-09 10:58

    I have tested the following XPath and Jsoup, it works.

    example 1:

    [XPath]

    //*[@id="docs"]/div[1]/h4/a
    

    [JSoup]

    document.select("#docs > div > h4 > a").attr("href");
    

    example 2:

    [XPath]

    //*[@id="action-bar-container"]/div/div[2]/a[2]
    

    [JSoup]

    document.select("#action-bar-container > div > div:eq(1) > a:eq(1)").attr("href"); 
    
    0 讨论(0)
  • 2020-12-09 11:00

    I am using Google Chrome Version 47.0.2526.73 m (64-bit) and I can now directly copy the Selector path which is compatible with JSoup



    Copied Selector of the element in the screenshot span.com is
    #question > table > tbody > tr:nth-child(1) > td.postcell > div > div.post-text > pre > code > span.com

    0 讨论(0)
  • 2020-12-09 11:07

    Depends what you want.

    Document doc = JSoup.parse(googleURL);
    doc.select("cite") //to get all the cite elements in the page
    
    doc.select("li > cite") //to get all the <cites>'s that only exist under the <li>'s
    
    doc.select("li.g cite") //to only get the <cite> tags under <li class=g> tags
    
    
    public static void main(String[] args) throws IOException {
        String html = getHTML();
        Document doc = Jsoup.parse(html);
        Elements elems = doc.select("li.g > cite");
        for(Element elem: elems){
            System.out.println(elem.toString());
        }
    }
    
    0 讨论(0)
  • 2020-12-09 11:10

    This is very easy to convert manually.

    Something like this (not tested)

    document.select("#docs > div:eq(1) > h4 > a").attr("href");
    

    Documentation:

    http://jsoup.org/cookbook/extracting-data/selector-syntax


    Related question from comment

    Trying to get the href for the first result here: cbssports.com/info/search#q=fantasy%20tom%20brady

    Code

    Elements select = Jsoup.connect("http://solr.cbssports.com/solr/select/?q=fantasy%20tom%20brady")
            .get()
            .select("response > result > doc > str[name=url]");
    
    for (Element element : select) {
        System.out.println(element.html());
    }
    

    Result

    http://fantasynews.cbssports.com/fantasyfootball/players/playerpage/187741/tom-brady
    http://www.cbssports.com/nfl/players/playerpage/187741/tom-brady
    http://fantasynews.cbssports.com/fantasycollegefootball/players/playerpage/1825265/brady-lisoski
    http://fantasynews.cbssports.com/fantasycollegefootball/players/playerpage/1766777/blake-brady
    http://fantasynews.cbssports.com/fantasycollegefootball/players/playerpage/1851211/brady-foltz
    http://fantasynews.cbssports.com/fantasycollegefootball/players/playerpage/1860955/brady-earnhardt
    http://fantasynews.cbssports.com/fantasycollegefootball/players/playerpage/1673397/brady-amack
    

    Screenshot from Developer Console - grabbing urls

    enter image description here

    0 讨论(0)
  • 2020-12-09 11:11

    Here is the working standalone snippet using Xsoup with Jsoup:

    import java.util.List;
    
    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    
    import us.codecraft.xsoup.Xsoup;
    
    public class TestXsoup {
        public static void main(String[] args){
    
                String html = "<html><div><a href='https://github.com'>github.com</a></div>" +
                        "<table><tr><td>a</td><td>b</td></tr></table></html>";
    
                Document document = Jsoup.parse(html);
    
                List<String> filasFiltradas = Xsoup.compile("//tr/td/text()").evaluate(document).list();
                System.out.println(filasFiltradas);
    
        }
    }
    

    Output:

    [a, b]
    

    Libraries included:

    xsoup-0.3.1.jar jsoup-1.103.jar

    0 讨论(0)
提交回复
热议问题