pdfbox

Splitting a PDF results in very large PDF documents with PDFBox 2.0.2

旧巷老猫 提交于 2019-12-23 01:01:49
问题 I want to use command java -jar pdfbox-app-2.y.z.jar PDFSplit [OPTIONS] <PDF file> to split one PDF into many other PDFs. But I found that there was a problem: the PDF splited is "ActiveMQ In Action(Manning-2011).pdf" and it's 14.1MB. But when I run java -jar pdfbox-app-2.0.2.jar PDFSplit -split 5 -startPage 21 -endPage 40 -outputPrefix abc "ActiveMQ In Action(Manning-2011).pdf" every PDF is lager than 79MB! How can I prevent this? 回答1: This is a known bug in PDFBox 2.0.2. Splitting works

PDFBox Form fill - saveIncremental does not work

大兔子大兔子 提交于 2019-12-22 12:50:23
问题 I have a pdf file with some form field that I want to fill from java. Right now I'm trying to fill just one form which I am finding by its name. My code looks like this: File file = new File("c:/Testy/luxmed/Skierowanie3.pdf"); PDDocument document = PDDocument.load(file); PDDocumentCatalog doc = document.getDocumentCatalog(); PDAcroForm Form = doc.getAcroForm(); String formName = "topmostSubform[0].Page1[0].pana_pania[0]"; PDField f = Form.getField(formName); setField(document, formName,

Using pdfbox in java to overlay text onto previously created pdf document

别等时光非礼了梦想. 提交于 2019-12-22 10:50:06
问题 I already have several PDF documents that have been created. What I am attempting to do is by using PDFBox. I need to put text into several places on these created documents but I do NOT want to modify the text that is within those areas. For instance, there may be a a section as follows - NAME: ______________________________ I will put text into that area, but I need the underline to remain the same length. I believe the best solution would be to just create a textbox or similar that goes

How do determine location of actual PDF content with PDFBox?

回眸只為那壹抹淺笑 提交于 2019-12-22 10:43:59
问题 We're printing some PDFs from a Java desktop app, using PDFBox, and the PDFs contain too much whitespace (fixing the PDF generator is unfortunately not an option). The problem I have is determining where the actual content on the page is, because the crop/media/trim/art/bleed boxes are useless. Is there some easy and efficient way to do so, better/faster than rendering the page to an image and examining which pixels stayed white? 回答1: As you have mentioned in a comment that it can be assumed

Cropping a region from a PDF page with PDFBox

天涯浪子 提交于 2019-12-22 09:59:41
问题 I am trying to crop a region out of a PDF page programmatically. Specifically, my input is going to be a single page PDF and a bounding box on the page. Output is going to be a PDF that contains the characters, graphics paths and images from the original PDF, and it should look like the original PDF. In other words, I want a function that is similar to cropping a region out of an image, but with PDFs. Three questions: Is it at all possible to do? From my knowledge of PDFs, it seems possible.

How do I add an ICC to an existing PDF document

别等时光非礼了梦想. 提交于 2019-12-22 08:26:22
问题 I have an existing PDF document that is using CMYK colors. It was created using a specific ICC profile, which I have obtained. The colors are obviously different if I open the document with the profile active than without. From what I can tell using a variety of tools, there is no ICC profile embedded in the document. What I would like to do is embed the ICC profile in the PDF so that it can be opened and viewed with the correct colors by third parties. My understanding is that this is

How to call pypdfocr functions to use them in a python script?

↘锁芯ラ 提交于 2019-12-22 07:46:51
问题 Recently I downloaded pypdfocr, however, in the documentation there are no examples of how to call pypdfocr as a library, could anybody help me to call it just to convert a single file?. I just found a terminal command: $ pypdfocr filename.pdf 回答1: If you're looking for the source code, it's normally under the directory site-package of your python installation. What's more, if you're using a IDE (i.e. Pycharm), it would help you find the directory and file. This is extremly useful to find

How to call pypdfocr functions to use them in a python script?

淺唱寂寞╮ 提交于 2019-12-22 07:46:49
问题 Recently I downloaded pypdfocr, however, in the documentation there are no examples of how to call pypdfocr as a library, could anybody help me to call it just to convert a single file?. I just found a terminal command: $ pypdfocr filename.pdf 回答1: If you're looking for the source code, it's normally under the directory site-package of your python installation. What's more, if you're using a IDE (i.e. Pycharm), it would help you find the directory and file. This is extremly useful to find

How can I create an accessible PDF with Java PDFBox 2.0.8 library that is also verifiable with PAC 2 tool?

烂漫一生 提交于 2019-12-22 03:45:32
问题 Background I have small project on GitHub in which I am trying to create a section 508 compliant (section508.gov) PDF which has form elements within a complex table structure. The tool recommended to verify these PDFs is at http://www.access-for-all.ch/en/pdf-lab/pdf-accessibility-checker-pac.html and my program’s output PDF does pass most of these checks. I will also know what every field is meant for at runtime, so adding tags to structure elements should not be an issue. The Problem The

custom schema to XMP metadata

坚强是说给别人听的谎言 提交于 2019-12-22 00:29:38
问题 I want to write custom metadata to a pdf file which are not supported by XMP standard schemas hence I wrote my own schema containing my own properties. I can successfully write these additional custom metadata to my PDF file using either PDFBox or iTextPDF library. I am however unable to read the custom metadata at client side without parsing the XMP xml. I guess there should be some API that I am not aware of for getting your custom schema back to your java class. Please help me if I am