How to resolve relative url with Jsoup?

て烟熏妆下的殇ゞ 提交于 2020-01-10 03:46:10

问题


Hi I have a problem with Jsoup.

I scrape a page and get a lot of urls. Some of those are relative urls like: "../index.php", "../admin", "../details.php".

I use attr("abs:href") to get the absolute url, but this links are rendered like www.domain.com/../admin.php

I would like to know if this is a bug.

Is there a way to get the real absolute path with jsoup? how can I solve this?

I have tried also with absurl("href"), but not working.


回答1:


also a good option is to use the abs:href or abs:src attributes:

String relHref = link.attr("href"); // == "/"
String absHref = link.attr("abs:href"); // "http://jsoup.org/"

this is also described there: http://jsoup.org/cookbook/extracting-data/working-with-urls




回答2:


If element contains a relative link you get the absolute link like this: element.absUrl("href").

But you have to set the base URI for your relative links before (call eg. setBaseUri("http://www.myexample.com") on your Document or Element).

Make shure your base Uri is long enough!

Good:

element.setBaseUri("http://www.example.com/abc/");
element.attr("href", "../b/here");

returns: http://www.example.com/b/here

Bad:

element.setBaseUri("http://www.example.com/abc/");
element.attr("href", "../../b/here");

returns: http://www.example.com/../b/here

--> your relative link is too long for you base uri!



来源:https://stackoverflow.com/questions/12041676/how-to-resolve-relative-url-with-jsoup

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!