Using JSoup to save the contents of this url: http://www.aw20.co.uk/images/logo.png to a file

二次信任 提交于 2020-01-01 18:25:51

问题


I am try to use JSoup to get the contents of this url http://www.aw20.co.uk/images/logo.png, which is the image logo.png, and save it to a file. So far I have used JSoup to connect to http://www.aw20.co.uk and get a Document. I then went and found the absolute url for the image I am looking for, but now am not sure how to this to get the actual image. So I was hoping someone could point me in the right direction to do so? Also is there anyway I could use Jsoup.connect("http://www.aw20.co.uk/images/logo.png").get(); to get the image?

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class JGet2 {

public static void main(String[] args) {

    try {
        Document doc = Jsoup.connect("http://www.aw20.co.uk").get();

        Elements img = doc.getElementsByTag("img");

        for (Element element : img) {
            String src = element.absUrl("src");

            System.out.println("Image Found!");
            System.out.println("src attribute is: " + src);
            if (src.contains("logo.png") == true) {
                System.out.println("Success");     
            }
            getImages(src);
        }
    } 

    catch (IOException e) {
        e.printStackTrace();
    }
}

private static void getImages(String src) throws IOException {

    int indexName = src.lastIndexOf("/");

    if (indexName == src.length()) {
        src = src.substring(1, indexName);
    }

    indexName = src.lastIndexOf("/");
    String name = src.substring(indexName, src.length());

    System.out.println(name);
}
}

回答1:


You can use Jsoup to fetch any URL and get the data as bytes, if you don't want to parse it as HTML. E.g.:

byte[] bytes = Jsoup.connect(imgUrl).ignoreContentType(true).execute().bodyAsBytes();

ignoreContentType(true) is set because otherwise Jsoup will throw an exception that the content is not HTML parseable -- that's OK in this case because we're using bodyAsBytes() to get the response body, rather than parsing.

Check the Jsoup Connection API for more details.




回答2:


Jsoup isn't designed for downloading the content of the url.

Since you are able to use a third party library, you can try apache common IO for downloading the content of a given URL to file using:

FileUtils.copyURLToFile(URL source, File destination);

It is only one line.




回答3:


This method does not work well. Please careful when using it.

byte[] bytes = Jsoup.connect(imgUrl).ignoreContentType(true).execute().bodyAsBytes();



回答4:


You can use these methods or part of these methods to solve your problem. NOTE: IMAGE_HOME is the absolute path. e.g. /home/yourname/foldername

public static String storeImageIntoFS(String imageUrl, String fileName, String relativePath) {
    String imagePath = null;
    try {
        byte[] bytes = Jsoup.connect(imageUrl).ignoreContentType(true).execute().bodyAsBytes();
        ByteBuffer buffer = ByteBuffer.wrap(bytes);
        String rootTargetDirectory = IMAGE_HOME + "/"+relativePath;
        imagePath = rootTargetDirectory + "/"+fileName;
        saveByteBufferImage(buffer, rootTargetDirectory, fileName);
    } catch (IOException e) {
        e.printStackTrace();
    }
    return imagePath;
}

public static void saveByteBufferImage(ByteBuffer imageDataBytes, String rootTargetDirectory, String savedFileName) {
   String uploadInputFile = rootTargetDirectory + "/"+savedFileName;

   File rootTargetDir = new File(rootTargetDirectory);
   if (!rootTargetDir.exists()) {
       boolean created = rootTargetDir.mkdirs();
       if (!created) {
           System.out.println("Error while creating directory for location- "+rootTargetDirectory);
       }
   }
   String[] fileNameParts = savedFileName.split("\\.");
   String format = fileNameParts[fileNameParts.length-1];

   File file = new File(uploadInputFile);
   BufferedImage bufferedImage;

   InputStream in = new ByteArrayInputStream(imageDataBytes.array());
   try {
       bufferedImage = ImageIO.read(in);
       ImageIO.write(bufferedImage, format, file);
   } catch (IOException e) {
       e.printStackTrace();
   }

}




回答5:


Also is there anyway I could use Jsoup.connect("http://www.aw20.co.uk/images/logo.png").get(); to get the image?

No, JSoup will only get text and such but cannot be used to download files or binary data. That being said, just use the file name and path that you've gotten through JSoup and then use standard Java I/O to download the file.

I've used NIO to do the downloading. i.e.,

     String imgPath = // ... url path to image
     String imgFilePath = // ... file path String

     URL imgUrl;
     ReadableByteChannel rbc = null;
     FileOutputStream fos = null;
     try {
        imgUrl = new URL(imgPath);
        rbc = Channels.newChannel(imgUrl.openStream());
        fos = new FileOutputStream(imgFilePath);
        // setState(EXTRACTING + imgFilePath);
        fos.getChannel().transferFrom(rbc, 0, 1 << 24);

     } catch (MalformedURLException e) {
        e.printStackTrace();
     } catch (FileNotFoundException e) {
        e.printStackTrace();
     } catch (IOException e) {
        e.printStackTrace();
     } finally {
        if (rbc != null) {
           try {
              rbc.close();
           } catch (IOException e) {
              e.printStackTrace();
           }
        }
        if (fos != null) {
           try {
              fos.close();
           } catch (IOException e) {
              e.printStackTrace();
           }
        }
     }


来源:https://stackoverflow.com/questions/12657592/using-jsoup-to-save-the-contents-of-this-url-http-www-aw20-co-uk-images-logo

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!