pdfbox

In PDFBox, why does file size becomes extremely large after saving?

百般思念 提交于 2019-12-01 07:39:06
Question I am using PDFBox 1.8.8 to manipulate existing PDF files. After saving a document, the output file becomes several times larger than the original. This is undesirable. How can I reduce the file size of output files? How to replicate my situation In the following code, PDFBox simply loads an existing PDF and then save it. Nothing else is done. Yet the file size still becomes several times larger. Below are links to two sample input files. For input1.pdf, file size increases from 6MB to 50MB. For input2.pdf, file size increases from 0.4MB to 1.3MB. https://dl.dropboxusercontent.com/u

PDFBox LayerUtility - Importing layers into existing PDF

隐身守侯 提交于 2019-12-01 06:36:52
I am using pdfbox to manipulate PDF content. I have a big PDF file (say 500 pages). I also have a few other single page PDF files containing only a single image which are around 8-15kb per file at the max. What I need to do is to import these single page pdf's like an overlay onto certain pages of the big PDF file. I have tried the LayerUtility of pdfbox where I've succeeded but it creates a very large sized file as the output. The source pdf is about 1MB before processing and when added with the smaller pdf files, the size goes upto 64MB. And sometimes I need to include two smaller PDF's onto

How to split pdf file by result in java pdfbox

痴心易碎 提交于 2019-12-01 06:20:44
I hve one pdf file, which contain 60 pages. In each pages I've unique and repeated Invoice Nos. Im using Apache PDFBOX. import java.io.*; import org.apache.pdfbox.pdmodel.*; import org.apache.pdfbox.util.*; import java.util.regex.*; public class PDFtest1 { public static void main(String[] args){ PDDocument pd; try { File input = new File("G:\\Sales.pdf"); // StringBuilder to store the extracted text StringBuilder sb = new StringBuilder(); pd = PDDocument.load(input); PDFTextStripper stripper = new PDFTextStripper(); // Add text to the StringBuilder from the PDF sb.append(stripper.getText(pd));

Is LucenePDFDocument gone from pdfbox?

三世轮回 提交于 2019-12-01 05:22:46
I'm upgrading libraries on my project and upgraded pdfbox from 0.6.7 to version 1.6.0 and can't find LucenePDFDocument class. The class is still mentioned in the documentation/tutorials on the Apache page. Any ideas? Jukka Zitting The Lucene support was moved to a separate component within PDFBox (see PDFBOX-752 ). You can find it under the lucene directory in the PDFBox source tree or as the pdfbox-lucene artifact on the central Maven repository. And the jars can be downloaded from sites like mvnrepository 来源: https://stackoverflow.com/questions/7974003/is-lucenepdfdocument-gone-from-pdfbox

How to search some specific string or a word and there coordinates from a pdf document in java

萝らか妹 提交于 2019-12-01 04:55:37
问题 I am using Pdfbox to search a word(or String) from a pdf file and I also want to know the coordinates of that word. For example :- in a pdf file there is a string like "${abc}". I want to know the coordinates of this string. I Tried some couple of examples but didn't get the result according to me. in result it is displaying the coordinates of character. Here is the Code @Override protected void writeString(String string, List<TextPosition> textPositions) throws IOException { for(TextPosition

How can I extract images and their metadata from PDFs?

懵懂的女人 提交于 2019-12-01 04:31:53
Is it possible to use Java to extract images from a PDF file and export them to a specific folder without losing their original creation and modification dates? I tried to achieve this goal by using IText and PDFBox but had no success. Any ideas or examples are welcome. Images do not contain metadata and are stored as raw data which needs to be assemebled into images. I wrote 2 blog posts explaining how image data is stored in a PDF file at https://blog.idrsolutions.com/2010/04/understanding-the-pdf-file-format-how-are-images-stored/ and https://blog.idrsolutions.com/2010/09/understanding-the

In PDFBox, why does file size becomes extremely large after saving?

女生的网名这么多〃 提交于 2019-12-01 03:45:27
问题 Question I am using PDFBox 1.8.8 to manipulate existing PDF files. After saving a document, the output file becomes several times larger than the original. This is undesirable. How can I reduce the file size of output files? How to replicate my situation In the following code, PDFBox simply loads an existing PDF and then save it. Nothing else is done. Yet the file size still becomes several times larger. Below are links to two sample input files. For input1.pdf, file size increases from 6MB

Cannot figure out how to use PDFBox

℡╲_俬逩灬. 提交于 2019-12-01 03:26:47
I am trying to create a PDF file with a lot of text boxes in the document and textfields from another class. I am using PDFBox. OK, creating a new file is easy and writing one line of text is easy. Now, when I am trying to insert the next text line or textfield, it overwrites the content. PDDocument doc = null; PDPage page = null; try{ doc = new PDDocument(); page = new PDPage(); doc.addPage(page); PDFont font = PDType1Font.HELVETICA_BOLD; PDPageContentStream title = new PDPageContentStream(doc, page); title.beginText(); title.setFont( font, 14 ); title.moveTextPositionByAmount( 230, 720 );

Not able to read the exact text highlighted across the lines

落爺英雄遲暮 提交于 2019-12-01 01:28:58
I am working on reading the highlighted from PDF document using PDBox. I was able to read the highlighted text in single line both single and multiple words. However, I could not read the highlighted text across the lines. Please find the following sample code to read the highlighted text. PDDocument pddDocument = PDDocument.load(new File("C:\\pdf-sample.pdf")); List allPages = pddDocument.getDocumentCatalog().getAllPages(); for (int i = 0; i < allPages.size(); i++) { int pageNum = i + 1; PDPage page = (PDPage) allPages.get(i); List<PDAnnotation> la = page.getAnnotations(); if (la.size() < 1)

Cannot figure out how to use PDFBox

一个人想着一个人 提交于 2019-12-01 00:02:46
问题 I am trying to create a PDF file with a lot of text boxes in the document and textfields from another class. I am using PDFBox. OK, creating a new file is easy and writing one line of text is easy. Now, when I am trying to insert the next text line or textfield, it overwrites the content. PDDocument doc = null; PDPage page = null; try{ doc = new PDDocument(); page = new PDPage(); doc.addPage(page); PDFont font = PDType1Font.HELVETICA_BOLD; PDPageContentStream title = new PDPageContentStream