pdfbox

Printing to PostScript with PDFBox produces a massive file, why?

蹲街弑〆低调 提交于 2019-12-12 03:48:42
问题 I am using PDFBox to create PDFs and that is working great. I also have a need to create PostScript files which I would like to generate from the PDF I create. I am using the following code to have PDFBox work with SimpleDoc to create the PostScript file. That is working but the file is massive. A 30KB PDF produces a 2meg PostScript file. What do I need to change to create a reasonably sized PostScript file? PrintRequestAttributeSet aset = new HashPrintRequestAttributeSet(); aset.add

How to create PageDrawer instance in PDFBox 2.0?

。_饼干妹妹 提交于 2019-12-12 03:36:45
问题 I could not replace PDF page in PDF document when there is high margin. How to resize PDF page using pdfbox2.0 ? If pdf page content (in input pdf document) is 6" x 8" - then i want to make page size as 5" x 7" and save the pdf document 回答1: Assuming you have a PDPage object: PDRectangle mediaBox = page.getMediaBox(); if (mediaBox.getWidth() == 6 * 72 && mediaBox.getHeight() == 8 * 72) mediaBox = new PDRectangle(5 * 72, 7 * 72); and then save your document. If you're using 1.8, then use

pdfBox Return Bad Encoding Charachter

我的未来我决定 提交于 2019-12-12 03:29:36
问题 i have a pdf http://www.persianacademy.ir/UserFiles/File/fe1394.pdfthat i want to extract words from it(contain persian words.).i use PDFBox library to get words.here is my code: package ir.blog.stack; import java.io.File; import java.io.IOException; import org.apache.pdfbox.cos.COSDocument; import org.apache.pdfbox.io.RandomAccessFile; import org.apache.pdfbox.pdfparser.PDFParser; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.text.PDFTextStripper; public class

Rendering a document with filled form fields using PDFBox works with 1.8.2, but not 2.0.2

泪湿孤枕 提交于 2019-12-12 03:18:51
问题 My goal is to open a PDF document, fill in some form fields and then render it to an image. I'm using PDFBox with Java to do it. I started using version 2.0.2 (latest) and filling the form fields works. When I save it and then open it with a PDF reader, the form fields have values. But when I render it to an image, the form fields have black borders and no text inside. I then tried the same thing with 1.8.12 and it works. However, I would really like to use the new features in 2.x. The PDF

PDFBox and chinese characters

本小妞迷上赌 提交于 2019-12-12 03:17:42
问题 I am using pdfbox 1.8 and I am trying to fill a pdf form with chinese character but all I got is strange characters. I got a ttc file (uming.ttc) and using font forge I exported ttf file (right now I am tryng to use only one of the exported fonts). Loading of the fonts is done using InputStream is = .. PDTrueTypeFont font = PDTrueTypeFont.loadTTF(doc, is); and I am writing the pdf field using the following code (that I found here in stackoverflow but currently I can't found it) protected void

How do I add relative hyperlinks to a group of pdf files using pdfbox?

ⅰ亾dé卋堺 提交于 2019-12-12 03:07:29
问题 I am currently implementing functionality to parse a group of pdf's to retrieve each pdfs meta data.And then interlink these by adding hyperlinks links to each pdf wherever another pdf is being referenced inside it.I am able to create absolute hyperlinks.But after these pdfs are uploaded to a server, then they can be downloaded from server to any local machine path.I want these hyperlinks to work after they are downloaded to a different location.So, how can create hyperlinks which are

Facing set datapath error while using tesseract in java

拜拜、爱过 提交于 2019-12-12 03:02:36
问题 I am using tesseract to recognize text from pdfs and I am facing some weird error. The error is Error opening data file data/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Now, I understand the meaning of this error and my path is updated to the parent directory of data folder. But the weird thing is that I don't get this error instantly when I run my code but I get it after recognizing 10-15 pdfs

no.of pages in pdf file

亡梦爱人 提交于 2019-12-12 02:43:48
问题 I am reading a pdf file using "pdfbox";I am not getting the total number of pages in the pdf document..I don't know why this is happening. try { parser = new PDFParser(new FileInputStream(file)); parser.parse(); cosDoc = parser.getDocument(); pdfStripper = new PDFTextStripper(); pdDoc = new PDDocument(cosDoc); for (int i = 1; i <= pdDoc.getDocumentCatalog().getAllPages().size(); i++) { pdfStripper.setStartPage(i); pdfStripper.setEndPage(i); parsedText = pdfStripper.getText(pdDoc); if(i==11)

Maven2 Eclipse Plugin

烂漫一生 提交于 2019-12-12 02:39:38
问题 I have just added dependencies to a project so that my jar, specifically pdfbox 1.6, can see other jars. After adding my dependencies with the right click onto project feature that maven offers how can I be sure that the dependencies work, and that what I've done is correct? I view a pom.xml file that has been created so what are the target folder and classes,test-classes subfolders used for? Thanks 回答1: you can build your project and see if the dependencies are in the final artifact. Also

Use PDFBox to Merge Pages?

余生长醉 提交于 2019-12-12 02:34:38
问题 I know I can use PDFBox to merge multiple PDF's into one PDF. But is there a way to merge pages? For example, I have a header in PDF and want it to be inserted to the top of the first page of the combined PDF and push everything down. Is there a way to do it using PDFBox API? 回答1: Here is some code that works to copy two files into a merged one with multiple copies of each one. It copies by pages. It's something I got using the information in the answer to this question: Can duplicating a pdf