pdfbox | 易学教程

PDFBOX: Convert a pdf to text or html, including images from the pdf

阅读更多关于 PDFBOX: Convert a pdf to text or html, including images from the pdf

问题 I am developing a mobile application that converts pdf to html. I found PDFBox, which works very well. I obtained the PDF text or html on one side and the other images. But I want to go a little further, I need the generated html contains the images in the pdf. Can it be done with PDFBox? How? If you know of another free library function to do this, tell me. Thanks in advance. 回答1: Take a look at ExtractImages.java - this will guide you on how to extract images from PDF file. Next investigate

PDFBox 2 unusual memory consumption

阅读更多关于 PDFBox 2 unusual memory consumption

问题 We are trying to render images from different PDF files, using PDFRenderer's method renderImageWithDPI. On a particular PDF, for some pages, the library renderer has a different behaviour. The rendering itself takes way longer than for other similar pages, and the memory consumption reaches unusually big values: the memory consumed by the process goes up with about 50MB every 1 - 2 seconds, until it reaches values like 5GB of RAM consumed by the application process while in renderImageWithDPI

How to get resource names for optional content group in a pdf?

阅读更多关于 How to get resource names for optional content group in a pdf?

问题 I am trying to implement functionality to allow user to add markups to existing layers in a pdf. Here is the code that I am using to draw lines on to a layer in a pdf: PDResources resources = page.findResources(); PDPropertyList props = resources.getProperties(); COSName resourceName = getLayerResourceName("Superimposed3", resources, props); PDPageContentStream cs1 = new PDPageContentStream(document, page, true, false); cs1.beginMarkedContentSequence(COSName.OC, resourceName); cs1

PDFBox draw black image from BufferedImage

阅读更多关于 PDFBox draw black image from BufferedImage

问题 I try to draw an image from a bufferedImage into a PDF using PDFBox but fails, and I get black images and Acrobat Reader warns whith errors like "Out of memory" (but PDF is display). I use a bufferedImage because I need to draw a JavaFX Image object (with came from call to Funciones.crearImagenDesdeTexto(), is a function which converts a text into an Image) into PDF. Rest of images works well without using bufferedimage. PDPixelMap img = null; BufferedImage bi; try { //If item has id, I try

How to get page content height using pdfbox

阅读更多关于 How to get page content height using pdfbox

问题 Is this possible to get the height of the page content using pdfbox? I think I tried everything but each (PDRectangle) returns full height of the page: 842. First I thought that this is because the page number place at the bottom of the page, but when I opened pdf in Illustrator, the whole content is inside compound element, and isn't extended to the whole page height. So if illustrator can see it as separate element and calculate its height, I guess this should also be possible in pdfbox.

PDFBox on Mac critical error when silent printing

阅读更多关于 PDFBox on Mac critical error when silent printing

问题 I have been experimenting with bumping my applications dependency on PDFBox to the 2.0.0 snapshot. I'm having some major issues with it though... So my code recieves a PDF as a BASE64 String, i decode it, and load the resulting bytearray into a PDDocument. Before I bumped the version number, calling .silentPrint(); on the PDDocument worked like a charm. The implementation of silent printing changed in 2.0.0, and I now do it this way: private Status doPdfPrint(Document document, PrintService

Get text layer of a PDF as is and pass it to another PDF

阅读更多关于 Get text layer of a PDF as is and pass it to another PDF

问题 Good afternoon , I have a problem in my project, this is PDF compression , the process is as follows: Extract images from a PDF Hang OCR Compression Stock OCR + Merge image and convert PDF per page Combine all the generated pdf with OCR, OCR PDFcon one out as a final product. The size of my original file is 11 MB and 4.2 MB compressed . The whole process works perfectly , but the problem that I have is the speed in the OCR process . I was checking on the web, and I saw a way to circumvent

Extracting text from an area with PDFbox

阅读更多关于 Extracting text from an area with PDFbox

问题 is it possible to extract text from an area with PDFbox using just the binaries instead of having to create my own code? 回答1: Compile and pack this simple program into a jar import java.awt.geom.Rectangle2D; import java.io.File; import java.io.IOException; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.text.PDFTextStripperByArea; public class ExtractText { // Usage: xxx.jar filepath page x y width height public static void main

Convert Tiff to Pdf in java using itext

阅读更多关于 Convert Tiff to Pdf in java using itext

问题 I am using the below code for converting tiff to pdf It works fine for tiff images of dimensions 850*1100.But when I am trying to give the input tiff image of dimensions(Eg :- 1574*732, 684*353 or other 850*1100), I am getting the below error. Please help me how to convert tiff images of different dimensions to pdf. Error Occured for below code . Compression JPEG is only supported with a single strip. This image has 45 strips. RandomAccessFileOrArray myTifFile = null; com.itextpdf.text

PDF manipulation with placeholders

阅读更多关于 PDF manipulation with placeholders

问题 I am looking for a Java tool that can manipulate an existing PDF containing placeholders like ${foo} . I want to generate mail merge documents from that. I found a lot of solutions with forms but this seems not suitable for me. Currently I generate the PDF with iText but this is a really annoying task to convert existing Word files or similar. I didn't find another solution with iText so far. I also used JODReports in conjunction with JODConverter but it is necessary to run OpenOffice as a