pdfbox | 易学教程

Attachment damages signature

阅读更多关于 Attachment damages signature

I have PDF document. 1) Adobe reader reads document well. 2) I sign document (using pdfbox) and everything is well 3) I try to attach file to original pdf (Code is written in the pdfbox web page - in the cookBook). 4) Adobe reader reads attached document well. everything is well. 5) Now I have document with attachment. 6) I try to sign that document (I mean document with attachment). And I have 2 problem: First: when I open document, Adobe reader tells me that signature byte range is invalid. Second: when I try to close document (I mean to close adobe reader), Adobe reader tells me that: Do

PDF Box generating blank images due to JBIG2 Images in it

阅读更多关于 PDF Box generating blank images due to JBIG2 Images in it

问题 Let me give you an overview of my project first. I have a pdf which I need to convert into images(One image for one page) using PDFBox API and write all those images onto a new pdf using PDFBox API itself. Basically, converting a pdf into a pdf, which we refer to as PDF Transcoding. For certain pdfs, which contain JBIG2 images, PDFbox implementation of convertToImage() method is failing silently without any exceptions or errors and finally, producing a PDF, but this time, just with blank

Getting java.lang.NoClassDefFoundError: org/pdfbox/pdfparser/

阅读更多关于 Getting java.lang.NoClassDefFoundError: org/pdfbox/pdfparser/

问题 Below is the code that I am using, I've provided one pdf file and one text file as an input to command line. import org.pdfbox.cos.COSDocument; import org.pdfbox.pdfparser.PDFParser; import org.pdfbox.pdmodel.PDDocument; import org.pdfbox.pdmodel.PDDocumentInformation; import org.pdfbox.util.PDFTextStripper; import java.io.File; import java.io.FileInputStream; import java.io.PrintWriter; public class PDFTextParser { PDFParser parser; String parsedText; PDFTextStripper pdfStripper; PDDocument

PDFBox delete comment maintain strikethrough

阅读更多关于 PDFBox delete comment maintain strikethrough

问题 I have a PDF which has a comment on a paragraph. This paragraph is strickedthrough. My requirement is to delete the command from a specific page. The following code should delete a specific comment from my PDF but it does not. PDDocument document = PDDocument.load(...File...); List<PDAnnotation> annotations = new ArrayList<>(); PDPageTree allPages = document.getDocumentCatalog().getPages(); for (int i = 0; i < allPages.getCount(); i++) { PDPage page = allPages.get(i); annotations = page

What is this java.io.IOException: Error: Expected a long type, actual='930[299' tells?

阅读更多关于 What is this java.io.IOException: Error: Expected a long type, actual='930[299' tells?

问题 I created a program to read and extract text from PDF files... But it producing this exception during execution.. java.io.IOException: Error: Expected a long type, actual='930[299' at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1669) at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:100) at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:632) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:244)

Get all text operators whose color is black, pdfBox

阅读更多关于 Get all text operators whose color is black, pdfBox

问题 While parsing a already present pdf, I am using if(op.getOperation().equals( "TJ")) to get text operators, What I want to do is to target only the ones whose color is black(or some other specifiable color). I am unable to find a method for the same in pdfBox docs. Edit : Basically what I want to do is to keep only black colored text on the pdf, and remove/delete any other text operator which doesnt match the criteria. Can anyone share a solution ? Thanks ! 回答1: Text showing operators While

Extracting and Printing text positions

阅读更多关于 Extracting and Printing text positions

问题 I've been doing some experiments on pdfbox and I'm currently stuck on a issue which I suspect has something to do with coordinate system. I'm extending PDFTextStripper to get the X and Y of each character in a pdf page. Originally I was creating an Image with ImageIO printing the text at the position I received, and putting a little mark (rectangles with different colors) on the bottom of each reference I wanted, and everything seemed well. But now to avoid losing the style from the pdf I

Unable find location of ColorSpace objects in PDF document

阅读更多关于 Unable find location of ColorSpace objects in PDF document

问题 I want to identify the ColorSpace objects in PDF and fetch their location(coordinates, width and height of the colorspace) in the page. I tried traversing through the BaseDataObject in Contents.ContentContext.Resources.ColorSpaces , I can identify the Pantone Colorspaces in file (as shown in screenshot), but unable to find info regarding the location(x,y,w and h) of the object. Where can I find the exact location of the visible objects(visible on opening a document) like ColorSpaces and

compare two pdf files (approach) using java [closed]

阅读更多关于 compare two pdf files (approach) using java [closed]

问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed last year . i need to write a java class that compares two pdf files and points out the differences(differences in text/position/font) using some sort of highlighting. my initial approach was use pdfbox to parse the file using pdfbox and store the extracted text using in some data structure

PDFBox 2.0 and TTC Fonts

阅读更多关于 PDFBox 2.0 and TTC Fonts

问题 I am trying to use PDFBox 2.0 (snapshot of 20151009) due to the availability of TTC support. But I haven't found any documentation on how to use this feature. I found a ticket here https://issues.apache.org/jira/browse/PDFBOX-2752 and I found how to load TTC file: InputStream is = MyClass.class.getResourceAsStream("font.ttc"); TrueTypeCollection coll = new TrueTypeCollection(is); but I don't know how to embed TrueTypeFont into my PDDocument. In PDFBox 1.8 I was using something similar to the