pdfbox

Attachment damages signature

╄→尐↘猪︶ㄣ 提交于 2019-12-08 18:16:25
I have PDF document. 1) Adobe reader reads document well. 2) I sign document (using pdfbox) and everything is well 3) I try to attach file to original pdf (Code is written in the pdfbox web page - in the cookBook). 4) Adobe reader reads attached document well. everything is well. 5) Now I have document with attachment. 6) I try to sign that document (I mean document with attachment). And I have 2 problem: First: when I open document, Adobe reader tells me that signature byte range is invalid. Second: when I try to close document (I mean to close adobe reader), Adobe reader tells me that: Do

PDF Box generating blank images due to JBIG2 Images in it

爷,独闯天下 提交于 2019-12-08 17:34:30
问题 Let me give you an overview of my project first. I have a pdf which I need to convert into images(One image for one page) using PDFBox API and write all those images onto a new pdf using PDFBox API itself. Basically, converting a pdf into a pdf, which we refer to as PDF Transcoding. For certain pdfs, which contain JBIG2 images, PDFbox implementation of convertToImage() method is failing silently without any exceptions or errors and finally, producing a PDF, but this time, just with blank

Getting java.lang.NoClassDefFoundError: org/pdfbox/pdfparser/

谁说我不能喝 提交于 2019-12-08 14:52:22
问题 Below is the code that I am using, I've provided one pdf file and one text file as an input to command line. import org.pdfbox.cos.COSDocument; import org.pdfbox.pdfparser.PDFParser; import org.pdfbox.pdmodel.PDDocument; import org.pdfbox.pdmodel.PDDocumentInformation; import org.pdfbox.util.PDFTextStripper; import java.io.File; import java.io.FileInputStream; import java.io.PrintWriter; public class PDFTextParser { PDFParser parser; String parsedText; PDFTextStripper pdfStripper; PDDocument

PDFBox delete comment maintain strikethrough

为君一笑 提交于 2019-12-08 13:39:24
问题 I have a PDF which has a comment on a paragraph. This paragraph is strickedthrough. My requirement is to delete the command from a specific page. The following code should delete a specific comment from my PDF but it does not. PDDocument document = PDDocument.load(...File...); List<PDAnnotation> annotations = new ArrayList<>(); PDPageTree allPages = document.getDocumentCatalog().getPages(); for (int i = 0; i < allPages.getCount(); i++) { PDPage page = allPages.get(i); annotations = page

What is this java.io.IOException: Error: Expected a long type, actual='930[299' tells?

本小妞迷上赌 提交于 2019-12-08 13:37:41
问题 I created a program to read and extract text from PDF files... But it producing this exception during execution.. java.io.IOException: Error: Expected a long type, actual='930[299' at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1669) at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:100) at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:632) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:244)

Get all text operators whose color is black, pdfBox

蓝咒 提交于 2019-12-08 13:02:24
问题 While parsing a already present pdf, I am using if(op.getOperation().equals( "TJ")) to get text operators, What I want to do is to target only the ones whose color is black(or some other specifiable color). I am unable to find a method for the same in pdfBox docs. Edit : Basically what I want to do is to keep only black colored text on the pdf, and remove/delete any other text operator which doesnt match the criteria. Can anyone share a solution ? Thanks ! 回答1: Text showing operators While

Extracting and Printing text positions

白昼怎懂夜的黑 提交于 2019-12-08 12:46:01
问题 I've been doing some experiments on pdfbox and I'm currently stuck on a issue which I suspect has something to do with coordinate system. I'm extending PDFTextStripper to get the X and Y of each character in a pdf page. Originally I was creating an Image with ImageIO printing the text at the position I received, and putting a little mark (rectangles with different colors) on the bottom of each reference I wanted, and everything seemed well. But now to avoid losing the style from the pdf I

Unable find location of ColorSpace objects in PDF document

为君一笑 提交于 2019-12-08 12:39:00
问题 I want to identify the ColorSpace objects in PDF and fetch their location(coordinates, width and height of the colorspace) in the page. I tried traversing through the BaseDataObject in Contents.ContentContext.Resources.ColorSpaces , I can identify the Pantone Colorspaces in file (as shown in screenshot), but unable to find info regarding the location(x,y,w and h) of the object. Where can I find the exact location of the visible objects(visible on opening a document) like ColorSpaces and

compare two pdf files (approach) using java [closed]

两盒软妹~` 提交于 2019-12-08 12:23:07
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed last year . i need to write a java class that compares two pdf files and points out the differences(differences in text/position/font) using some sort of highlighting. my initial approach was use pdfbox to parse the file using pdfbox and store the extracted text using in some data structure

PDFBox 2.0 and TTC Fonts

独自空忆成欢 提交于 2019-12-08 12:13:11
问题 I am trying to use PDFBox 2.0 (snapshot of 20151009) due to the availability of TTC support. But I haven't found any documentation on how to use this feature. I found a ticket here https://issues.apache.org/jira/browse/PDFBOX-2752 and I found how to load TTC file: InputStream is = MyClass.class.getResourceAsStream("font.ttc"); TrueTypeCollection coll = new TrueTypeCollection(is); but I don't know how to embed TrueTypeFont into my PDDocument. In PDFBox 1.8 I was using something similar to the