can jsoup handle meta refresh redirect

前端 未结 2 2043
一生所求
一生所求 2020-12-09 21:50

I have a problem using jsoup what I am trying to do is fetch a document from the url which will redirect to another url based on meta refresh url which is not working, to ex

2条回答
  •  庸人自扰
    2020-12-09 22:47

    Update (case insensitive and pretty fault tolerant)

    • The content parsed (almost) according to spec
    • The first successfully parsed content meta data should be used

    public static void main(String[] args) throws Exception {
    
        URI uri = URI.create("http://www.amerisourcebergendrug.com");
    
        Document d = Jsoup.connect(uri.toString()).get();
    
        for (Element refresh : d.select("html head meta[http-equiv=refresh]")) {
    
            Matcher m = Pattern.compile("(?si)\\d+;\\s*url=(.+)|\\d+")
                               .matcher(refresh.attr("content"));
    
            // find the first one that is valid
            if (m.matches()) {
                if (m.group(1) != null)
                    d = Jsoup.connect(uri.resolve(m.group(1)).toString()).get();
                break;
            }
        }
    }
    

    Outputs correctly:

    http://www.amerisourcebergendrug.com/abcdrug/
    

    Old answer:

    Are you sure that it isn't working. For me:

    System.out.println(Jsoup.connect("http://www.ibm.com").get().baseUri());
    

    .. outputs http://www.ibm.com/us/en/ correctly..

提交回复
热议问题