pdfbox | 易学教程

flask RESTful service for pdfbox

阅读更多关于 flask RESTful service for pdfbox

问题 #!/usr/bin/env python3 import jpype import jpype.imports jpype.addClassPath(sys.argv[1]) jpype.startJVM(convertStrings=False) import org.apache.pdfbox.tools as tools tools.ExtractText.main(['-startPage', '1', sys.argv[2], sys.argv[3]]) I use the following python code to call pdfbox. $ ./main.py pdfbox-app-2.0.20.jar in.pdf output.txt But it would be slow to load jar file each time when I want to convert a pdf file. Could anybody providing the flask code to make a RESTful service so that

The method getKids() is undefined for the type PDField

阅读更多关于 The method getKids() is undefined for the type PDField

问题 https://issues.apache.org/jira/browse/PDFBOX-2148 When there are multiple copies with the same field name, the getFullyQualifiedName for each kid in the list of PDField objects returns the name of the parent, followed by .null . So if the parent field is called Button2 and it has 4 instances the result of printing out all the names will be: Button2.null Button2.null Button2.null Button2.null 回答1: According to the comments to the question, the OP refers to PDFBox 2.0.x versions, in particular

How can I add to the list of font substitutions in pdfbox 2.0.7?

阅读更多关于 How can I add to the list of font substitutions in pdfbox 2.0.7?

问题 I'm using FontMapper.getTrueTypeFont() to find available fonts by name in pdfbox 2.0.7. This has a feature to map fonts (by name) so that if I ask for Symbol and my system only has SymbolMT it will return that as a substitute. But the default implementation doesn't map the other way. My system has Symbol installed, but if I try to get SymbolMT it returns Helvetica as the best match (which doesn't work very well). The underlying FontMapperImpl class has an addSubstitute() method that lets you

How can I add to the list of font substitutions in pdfbox 2.0.7?

阅读更多关于 How can I add to the list of font substitutions in pdfbox 2.0.7?

PDFBOX - header in all pages using easytable

阅读更多关于 PDFBOX - header in all pages using easytable

问题 I am using pdfbox and easytable https://github.com/vandeseer/easytable for creating dynamic pages which works great. But I do want header to be added in alL pages. I faced/tried below things. 1) Tablebuilder is created before writing rows so we can create a perfect tablebuilder since rows are dynamic. 2) Tried to insert header in middle while creating tablebuilder which again is not perfect since TableDrawer makes the rows to suffice according to row height Any idea/help would be appreciated.

PDFBOX - header in all pages using easytable

阅读更多关于 PDFBOX - header in all pages using easytable

PDF content stream “TJ /Tj” split without messing the remaining text matrices?

阅读更多关于 PDF content stream “TJ /Tj” split without messing the remaining text matrices?

问题 I want to split TJ/Tj operator's COSString using the PDFBOX. My pdf current content stream looks like below. Desired output or what I tried? public static void SplitTj_TJ(int tj_ind, PDDocument document) throws IOException{ PDPage page = document.getPage(0); PDFStreamParser parser = new PDFStreamParser(page); parser.parse(); List tokens = parser.getTokens(); Operator op = (Operator) tokens.get(tj_ind); COSFloat dest_x = new COSFloat((float) 90.81199646); COSFloat dest_y = new COSFloat((float)

In PDFBox, how to create a link annotation with “rollover” / “mouse over” effects?

阅读更多关于 In PDFBox, how to create a link annotation with “rollover” / “mouse over” effects?

问题 Question: With PDFBox, how can I create a link annotation with "mouse over" color effect (aka rollover / mouse hover)? It means that when I hover my mouse cursor over a link in a PDF file (without clicking it), the link changes to a different color. And if I mouse the cursor away, the link changes backs to the original color. For example: The effect that I am looking for is similar to the links at stackoverflow website. When you hover the mouse cursor over ( without clicking ) the "Ask

Copy PDF to a new PDF, but without certain bits of the document

阅读更多关于 Copy PDF to a new PDF, but without certain bits of the document

问题 I'm trying to do something that I know isn't 100% reliable, but I've read about it and it is my understanding that the only problem I'm facing with trying to remove certain bits of text from a PDF file is that I can't replace them. What I'm trying to do is take the contents of a PDF file, then copy that content over to another PDF file, but without a regular expression found. I have found the expressions in my PDF file, and it works. However, I can't figure out a way to remove them. Is there

How do I fix the Tagged Annotations fail/error for accessibility for links using pdfbox java?

阅读更多关于 How do I fix the Tagged Annotations fail/error for accessibility for links using pdfbox java?

问题 Found the solution by using adobe - https://answers.acrobatusers.com/How-I-fix-Tagged-Annotations-fail-error-accessibility-links-q228128.aspx How can I add Link-OBJR (object reference to link annotation) using pdfbox. `PDAnnotationLink txtLink = new PDAnnotationLink(); PDRectangle position = new PDRectangle(); position.setLowerLeftX(PDFUtils.lw_lft_x); position.setLowerLeftY(PDFUtils.lw_lft_y); position.setUpperRightX(PDFUtils.tp_rgt_x); position.setUpperRightY(PDFUtils.tp_rgt_y); txtLink