pdfbox

flask RESTful service for pdfbox

心已入冬 提交于 2020-07-20 06:54:12
问题 #!/usr/bin/env python3 import jpype import jpype.imports jpype.addClassPath(sys.argv[1]) jpype.startJVM(convertStrings=False) import org.apache.pdfbox.tools as tools tools.ExtractText.main(['-startPage', '1', sys.argv[2], sys.argv[3]]) I use the following python code to call pdfbox. $ ./main.py pdfbox-app-2.0.20.jar in.pdf output.txt But it would be slow to load jar file each time when I want to convert a pdf file. Could anybody providing the flask code to make a RESTful service so that

The method getKids() is undefined for the type PDField

℡╲_俬逩灬. 提交于 2020-07-09 13:25:07
问题 https://issues.apache.org/jira/browse/PDFBOX-2148 When there are multiple copies with the same field name, the getFullyQualifiedName for each kid in the list of PDField objects returns the name of the parent, followed by .null . So if the parent field is called Button2 and it has 4 instances the result of printing out all the names will be: Button2.null Button2.null Button2.null Button2.null 回答1: According to the comments to the question, the OP refers to PDFBox 2.0.x versions, in particular

How can I add to the list of font substitutions in pdfbox 2.0.7?

谁都会走 提交于 2020-06-16 17:36:13
问题 I'm using FontMapper.getTrueTypeFont() to find available fonts by name in pdfbox 2.0.7. This has a feature to map fonts (by name) so that if I ask for Symbol and my system only has SymbolMT it will return that as a substitute. But the default implementation doesn't map the other way. My system has Symbol installed, but if I try to get SymbolMT it returns Helvetica as the best match (which doesn't work very well). The underlying FontMapperImpl class has an addSubstitute() method that lets you

How can I add to the list of font substitutions in pdfbox 2.0.7?

我是研究僧i 提交于 2020-06-16 17:36:04
问题 I'm using FontMapper.getTrueTypeFont() to find available fonts by name in pdfbox 2.0.7. This has a feature to map fonts (by name) so that if I ask for Symbol and my system only has SymbolMT it will return that as a substitute. But the default implementation doesn't map the other way. My system has Symbol installed, but if I try to get SymbolMT it returns Helvetica as the best match (which doesn't work very well). The underlying FontMapperImpl class has an addSubstitute() method that lets you

PDFBOX - header in all pages using easytable

对着背影说爱祢 提交于 2020-06-16 05:08:26
问题 I am using pdfbox and easytable https://github.com/vandeseer/easytable for creating dynamic pages which works great. But I do want header to be added in alL pages. I faced/tried below things. 1) Tablebuilder is created before writing rows so we can create a perfect tablebuilder since rows are dynamic. 2) Tried to insert header in middle while creating tablebuilder which again is not perfect since TableDrawer makes the rows to suffice according to row height Any idea/help would be appreciated.

PDFBOX - header in all pages using easytable

坚强是说给别人听的谎言 提交于 2020-06-16 05:08:16
问题 I am using pdfbox and easytable https://github.com/vandeseer/easytable for creating dynamic pages which works great. But I do want header to be added in alL pages. I faced/tried below things. 1) Tablebuilder is created before writing rows so we can create a perfect tablebuilder since rows are dynamic. 2) Tried to insert header in middle while creating tablebuilder which again is not perfect since TableDrawer makes the rows to suffice according to row height Any idea/help would be appreciated.

PDF content stream “TJ /Tj” split without messing the remaining text matrices?

↘锁芯ラ 提交于 2020-06-13 09:36:33
问题 I want to split TJ/Tj operator's COSString using the PDFBOX. My pdf current content stream looks like below. Desired output or what I tried? public static void SplitTj_TJ(int tj_ind, PDDocument document) throws IOException{ PDPage page = document.getPage(0); PDFStreamParser parser = new PDFStreamParser(page); parser.parse(); List tokens = parser.getTokens(); Operator op = (Operator) tokens.get(tj_ind); COSFloat dest_x = new COSFloat((float) 90.81199646); COSFloat dest_y = new COSFloat((float)

In PDFBox, how to create a link annotation with “rollover” / “mouse over” effects?

一曲冷凌霜 提交于 2020-06-11 07:55:51
问题 Question: With PDFBox, how can I create a link annotation with "mouse over" color effect (aka rollover / mouse hover)? It means that when I hover my mouse cursor over a link in a PDF file (without clicking it), the link changes to a different color. And if I mouse the cursor away, the link changes backs to the original color. For example: The effect that I am looking for is similar to the links at stackoverflow website. When you hover the mouse cursor over ( without clicking ) the "Ask

Copy PDF to a new PDF, but without certain bits of the document

心已入冬 提交于 2020-05-17 06:26:07
问题 I'm trying to do something that I know isn't 100% reliable, but I've read about it and it is my understanding that the only problem I'm facing with trying to remove certain bits of text from a PDF file is that I can't replace them. What I'm trying to do is take the contents of a PDF file, then copy that content over to another PDF file, but without a regular expression found. I have found the expressions in my PDF file, and it works. However, I can't figure out a way to remove them. Is there

How do I fix the Tagged Annotations fail/error for accessibility for links using pdfbox java?

怎甘沉沦 提交于 2020-05-13 14:03:36
问题 Found the solution by using adobe - https://answers.acrobatusers.com/How-I-fix-Tagged-Annotations-fail-error-accessibility-links-q228128.aspx How can I add Link-OBJR (object reference to link annotation) using pdfbox. `PDAnnotationLink txtLink = new PDAnnotationLink(); PDRectangle position = new PDRectangle(); position.setLowerLeftX(PDFUtils.lw_lft_x); position.setLowerLeftY(PDFUtils.lw_lft_y); position.setUpperRightX(PDFUtils.tp_rgt_x); position.setUpperRightY(PDFUtils.tp_rgt_y); txtLink