Jsoup getting background image path from css

僤鯓⒐⒋嵵緔 提交于 2019-12-31 00:44:12

问题


I am looking for all of the images on a given website.

For this purpose i need to find the ones that are within the css for example:

   .gk-crop {
    background-image: url("../images/style1/g_rss-2.png");
}

Now my question is how can i get all of these urls with JSoup?

so far ive tried the following:

    Document doc = Jsoup.connect(url).get();
    Elements imagePath = doc.select("[src]");
    imagePath.select("*[style*='background-image']");

but so far no luck.

Does anyone know how i can acheive it?


回答1:


Jsoup doesn't parse css files.

Have a look at this to know what Jsoup is responsible for.

You need a separate css parser to extract url from css files. Have a look at this




回答2:


Just like Niranjan mentioned, Jsoup is not for parsing CSS but XML. If you really need to extract some images from CSS, you will need to use some some 3rd party library for that purpose OR write simple regex for grabbing URLs from CSS file - its still plain text isn't it? This is not flexible resolution to your problem, but it would be the fastest one:)




回答3:


If you want to select the URL's of all the images on a website you can select all the image tags and then get the absolute URL's.

Example:

String html = "http://www.bbc.co.uk";
Document doc = Jsoup.connect(html).get();

Elements titles = doc.select("img");

for (Element e : titles) {
    System.out.println(e.absUrl("src"));
}

which will grab all the <img> elements and present it, such as

http://sa.bbc.co.uk/bbc/bbc/s?name=SET-COUNTER&pal_route=index&ml_name=barlesque&app_type=web&language=en-GB&ml_version=0.16.1&pal_webapp=wwhp&blq_s=3.5&blq_r=3.5&blq_v=default-worldwide
http://static.bbci.co.uk/frameworks/barlesque/2.50.2/desktop/3.5/img/blq-blocks_grey_alpha.png
http://static.bbci.co.uk/frameworks/barlesque/2.50.2/desktop/3.5/img/blq-search_grey_alpha.png
http://news.bbcimg.co.uk/media/images/69139000/jpg/_69139104_69139103.jpg
http://news.bbcimg.co.uk/media/images/69134000/jpg/_69134575_waynerooney1.jpg

If you only want the .JPG files, tell the selector that by including

Elements titles = doc.select("img[src$=.jpg]");

which result in only parsing the .JPG-urls.



来源:https://stackoverflow.com/questions/18075085/jsoup-getting-background-image-path-from-css

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!