I am trying to extract image from the pdf using pdfbox. I have taken help from this post . It worked for some of the pdfs but for others/most it did not. For example, I am not able to extract the figures in this file
After doing some research I found that PDResources.getImages is deprecated. So, I am using PDResources.getXObjects(). With this, I am not able to extract any image from the PDF and instead get this message at the console:
org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectForm cannot be cast to org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage
Now I am stuck and unable to find the solution. Please assist if anyone can.
//////UPDATE AS REPLY ON COMMENTS///
I am using pdfbox-1.8.10
Here is the code:
public void getimg ()throws Exception { try { String sourceDir = "C:/Users/admin/Desktop/pdfbox/mypdfbox/pdfbox/inputs/Yavaa.pdf"; String destinationDir = "C:/Users/admin/Desktop/pdfbox/mypdfbox/pdfbox/outputs/"; File oldFile = new File(sourceDir); if (oldFile.exists()){ PDDocument document = PDDocument.load(sourceDir); List<PDPage> list = document.getDocumentCatalog().getAllPages(); String fileName = oldFile.getName().replace(".pdf", "_cover"); int totalImages = 1; for (PDPage page : list) { PDResources pdResources = page.getResources(); Map pageImages = pdResources.getXObjects(); if (pageImages != null){ Iterator imageIter = pageImages.keySet().iterator(); while (imageIter.hasNext()){ String key = (String) imageIter.next(); Object obj = pageImages.get(key); if(obj instanceof PDXObjectImage) { PDXObjectImage pdxObjectImage = (PDXObjectImage) obj; pdxObjectImage.write2file(destinationDir + fileName+ "_" + totalImages); totalImages++; } } } } } else { System.err.println("File not exist"); } } catch (Exception e){ System.err.println(e.getMessage()); } }
//// PARTIAL SOLUTION/////
I have solved the problem of the error message. I have updated the correct code in the post as well. However, the problem remains the same. I am still not able to extract the images from few of the files. Like the one, I have mentioned in this post. Any solution in that regards.