pdfbox

Is there a way to create Bookmarks for pdf documents in PDFBOX?

独自空忆成欢 提交于 2019-12-25 07:10:11
问题 I am currently generating a large document from a database scheme and especially in bigger databases the amount of pages quickly exceeds 1000 pages... I use PDFBox to create the document and I am wondering whether PDFBox supports any way of creating Bookmarks that are displayed in the left side of Acrobat when viewing the document. On the website (and the related documentation) I haven't found anything helpful so far... Thanks in advance! 来源: https://stackoverflow.com/questions/24954281/is

Reading text of a pdf using PDFBOX occasionally returns \r\n

ぃ、小莉子 提交于 2019-12-25 06:55:20
问题 I’m currently using PDFBox to read the text of a set of pdfs that I’ve inherited. I’m only interested in reading the text, not making any changes to the file. The code that works for most of the files is: File pdfFile = myPath.toFile(); PDDocument document = PDDocument.load(pdfFile ); Writer sw = new StringWriter(); PDFTextStripper stripper = new PDFTextStripper(); stripper.setStartPage( 1 ); stripper.writeText( document, sw ); String documentText = sw.toString() For most files, I wind up

How to underlay a content stream with using PDPageContentStream?

馋奶兔 提交于 2019-12-25 03:19:06
问题 I am trying to create a watermark with using PDPageContentStream. This is what I have right now PDPageContentStream contentStream = new PDPageContentStream(doc,page, true,true); contentStream.beginText(); contentStream.setFont(font,40); contentStream.setTextRotation(Math.PI/4,page.getMediaBox().getWidth()/4,page.getMediaBox().getHeight()/4); contentStream.setNonStrokingColor(210,210,210); //light grey contentStream.drawString(_text); contentStream.endText(); contentStream.close(); What

Java- Does pdfBox have an option to open file instead of loading it?

泪湿孤枕 提交于 2019-12-25 03:14:22
问题 I am using PDFBox in Java to attempt to extract text from the pdf file. This is how I load the file: PDDocument document = PDDocument.load(new File(path1)); As you can see, it opens the file and loads the stuff inside it. This may cause issue when say I tried to load a file which has 10 million words or text which is huge and it throws an OutOfMemoryException:Java heap space . I actually tested this and it does throw an error. And the culprit was the line above. Is there a way to open the

pdfBox - contentStream.concatenate2CTM full documentation parameters

白昼怎懂夜的黑 提交于 2019-12-25 02:31:15
问题 jsf 2.1 / pdfbox Im tring to generate landscape pdfbox pdf and draw some strings to its contents but I didnt find any full specification about concatenate2CTM method. Does anyone have some full information about the concatenate 2CTM parameters I have only this but does not help me because I dont know what value I must enter. what means a...f operator ??? 回答1: This directly adds a cm operation to the content stream in question. Thus, you find those values a..f specified in the PDF

PDFBox 1.8 PrintTextLocations wrong TextPosition height for a multi page pdf

烂漫一生 提交于 2019-12-25 01:44:35
问题 I am running the example provided with PDFBox to get the width/height of each TextPosition. When I pass a one page pdf it gives me accurate results. But if I use a multi page pdf I am getting incorrect height. This is the experiment I did, I took a 5 page pdf and passed in as argument (got wrong height for each TextPosition). Next I split the same pdf into 5 single page pdfs using MacOSX Preview and passed each page one by one (I get correct height). package printtextlocations; import java.io

Extract footer data of PDF in java

自作多情 提交于 2019-12-25 01:44:02
问题 I am able to get data from pdf pages in a string. But along with those, footer data is also extracted. I want to remove those from all the pages of pdf. How can I remove that I used Rectangle2D but coordinates are not giving data 回答1: In a comment the OP indicated that he used this code: PDDocument doc = PDDocument.load("xyz.pdf"); PDPage page = (PDPage)doc.getDocumentCatalog().getAllPages().get( 1 ); Rectangle2D region = new Rectangle2D.Double(10, 10, 10, 10); String regionName = "region";

Some glyph ID's missing while trying to extract glyph ID from pdf

☆樱花仙子☆ 提交于 2019-12-24 20:52:09
问题 Due to Devanagiri glyph mapping to unicode character not being correct, I used the following code to extract the glyph ID and formed my own map to map ID's to proper unicode character. public class ExtractCharacterCodes { public static void testExtractFromSingNepChar() throws IOException { PDDocument document = PDDocument.load(new File("C:/PageSeparator/pattern3.pdf")); PDFTextStripper stripper = new PDFTextStripper() { @Override protected void writeString(String text, List<TextPosition>

PDFBox 2.x detect document changed after signing

蹲街弑〆低调 提交于 2019-12-24 19:41:46
问题 I'm trying to figure out how to detect if a document has been changed after it has been signed. I can't seem to find a good solution of this. Anyone know about this? EDIT Did some additional testing using only the " ShowSignature.java ". Here is what I found so far. If I change the document through PDFBox, both Adobe Reader & PDFBox will detect the broken signature. If I change the document with an Adobe product (Adobe Illustrator in this case) Adobe will report signature as broken, "

Opening a content stream blanks saved content?

て烟熏妆下的殇ゞ 提交于 2019-12-24 19:05:57
问题 I am trying to modify an existing PDF by adding some text to the header of each page. But even the simple sample code I have below ends up generating me a blank PDF as output: document = PDDocument.load(new File("c:/tmp/pdfbox_test_in.pdf")); PDPage page = (PDPage) document.getDocumentCatalog().getAllPages().get(0); PDPageContentStream contentStream = new PDPageContentStream(document, page); /* contentStream.beginText(); contentStream.setFont(font, 12); contentStream.moveTextPositionByAmount