pdfbox

pdfbox: how to clone a page

拥有回忆 提交于 2019-12-05 11:26:41
Using Apache PDFBox, I am editing an existing document and I would like to take one page from that document and simply clone it, copying whatever elements it contains. As an additional twist, I would like to get a reference to all the PDField s for any form fields in this newly cloned page. Here's the code I tried so far: PDPage newPage = new PDPage(lastPage.getCOSDictionary()); PDFCloneUtility cloner = new PDFCloneUtility(pdfDoc); pdfDoc.addPage(newPage); cloner.cloneMerge(lastPage, newPage); // there doesn't seem to be an API to read the fields from the page, need to filter them out from the

How to call pypdfocr functions to use them in a python script?

孤者浪人 提交于 2019-12-05 11:23:01
Recently I downloaded pypdfocr , however, in the documentation there are no examples of how to call pypdfocr as a library, could anybody help me to call it just to convert a single file?. I just found a terminal command: $ pypdfocr filename.pdf If you're looking for the source code, it's normally under the directory site-package of your python installation. What's more, if you're using a IDE (i.e. Pycharm), it would help you find the directory and file. This is extremly useful to find class as well and show you how you can instantiate it, for example : https://github.com/virantha/pypdfocr/blob

Attachment damages signature part 2

二次信任 提交于 2019-12-05 11:03:24
I created code that adds an image to an existing pdf document and then signs it, all using PDFBox (see code below). The code nicely adds the image and the signature. However, in some documents, Acrobat Reader complains that "The signature byte range is invalid." The problem seems to be the same as the problem described in this question. The answer to that question describes the problem in more detail: the problem is that my code leaves a mix of cross reference types in the document (streams and tables). Indeed, some documents won't even open because of the problems that this creates. My

BufferedImage color saturation

痴心易碎 提交于 2019-12-05 06:52:44
I'm writing a simple scanning application using jfreesane and Apache PDFBox . Here is the scanning code: InetAddress address = InetAddress.getByName("192.168.0.17"); SaneSession session = SaneSession.withRemoteSane(address); List<SaneDevice> devices = session.listDevices(); SaneDevice device = devices.get(0); device.open(); device.getOption("resolution").setIntegerValue(300); BufferedImage bimg = device.acquireImage(); File file = new File("test_scan.png"); ImageIO.write(bimg, "png", file); device.close(); And making PDF: PDDocument document = new PDDocument(); float width = bimg.getWidth();

Identify rgb and cmyk color from pdf

若如初见. 提交于 2019-12-05 06:30:25
问题 I have a PDF that consists of different color text and background color. How do I identify which colors are used in the PDF with CMYK or RGB format? StringBuilder sb_Sourcepdf = new StringBuilder(); PdfReader reader_FirstPdf = new PdfReader(pdf_of_FirstFile); Document document = new Document(); PDFParser parser = new PDFParser(new FileInputStream(pdf_of_FirstFile)); parser.parse(); PDDocument docum = parser.getPDDocument(); PDFStreamEngine engine = new PDFStreamEngine(); PDPage page = (PDPage

Determine whether a PDF page contains text or is purely picture

喜欢而已 提交于 2019-12-05 04:09:22
How to determine whether a PDF page contains text or is purely picture, using Java? I searched through many forums and websites, but I can not find an answer yet . Is it possible to extract text from PDF, to know if the page is in the format picture or text? PdfReader reader = new PdfReader(INPUTFILE); PrintWriter out = new PrintWriter(new FileOutputStream(OUTPUTFILE)); for (int i = 1; i <= reader.getNumberOfPages(); i++) { // here I want to test the structure of the page !!!! if it's possible out.println(PdfTextExtractor.getTextFromPage(reader, i)); } There is no water-proof way to do what

Using pdfbox to get form field values

六月ゝ 毕业季﹏ 提交于 2019-12-05 03:57:27
问题 I'm using pdfbox for the first time. Now I'm reading something on the website Pdf Summarizing I have a pdf like this: only that my file has many and many different component(textField,RadionButton,CheckBox). For this pdf I have to read these values : Mauro,Rossi,MyCompany. For now I wrote the following code: PDDocument pdDoc = PDDocument.loadNonSeq( myFile, null ); PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog(); PDAcroForm pdAcroForm = pdCatalog.getAcroForm(); for(PDField pdField :

PDF Signing, generated PDF Document certification is invalid? (using external signing, web-eid, HSM)

谁说我不能喝 提交于 2019-12-05 02:36:45
问题 I have a service which signs the data and provides me with the signed hash, it correctly generates PKCS#7 DigestInfo as stated in rfc2315#section-9.4 Something like this The code for the above system is : https://pastebin.com/b3qZH6xW //prepare signature PDSignature signature = new PDSignature(); signature.setFilter(PDSignature.FILTER_ADOBE_PPKLITE); signature.setSubFilter(PDSignature.SUBFILTER_ADBE_PKCS7_DETACHED); signature.setName("Ankit"); signature.setLocation("Bhopal, IN"); signature

Drawing vector images on PDF with PDFBox

好久不见. 提交于 2019-12-05 02:23:30
I would like to draw a vector image on a PDF with Apache PDFBox. This is the code I use to draw regular images PDPage page = (PDPage) document.getDocumentCatalog().getAllPages().get(1); PDPageContentStream contentStream = new PDPageContentStream(document, page, true, true); BufferedImage _prevImage = ImageIO.read(new FileInputStream("path/to/image.png")); PDPixelMap prevImage = new PDPixelMap(document, _prevImage); contentStream.drawXObject(prevImage, prevX, prevY, imageWidth, imageHeight); If I use a svg or wmf image instead of png, the resulting PDF document comes corrupted. The main reason

read text from a particular page using PDFBox [duplicate]

痞子三分冷 提交于 2019-12-05 01:14:11
This question already has an answer here: Reading a particular page from a PDF document using PDFBox 6 answers I know how to read text of an entire pdf file usinf PDFBox using PDFTextStripper.getText(PDDocument) . I also have a sample on how to get an object reference to a particular page using PDDocumentCatalog.getAllPages().get(i) . How do I get the text of just one page using PDFBox as I dont see any such method on PDPage class? You can set parameters on the PDFTextStripper to read particular pages: PDDocument doc; // document int i; // page no. PDFTextStripper reader = new PDFTextStripper(