pdfbox

Apache PDFBox Renders Straight Line Crooked in PNG

雨燕双飞 提交于 2019-12-08 04:23:45
问题 I have a PDF that when I render it to a png it renders a line crooked, or rather with a step in it. This is the PDF and what it should look like: https://drive.google.com/file/d/1E-zucbreD7pVwWc3Z4MNe_lzsP6D9m49/view Here is the full PNG rendering using PDFBox 2.0.13 and openjdk version 1.8.0_181: And here is the specific portion of the PNG that has the step: 回答1: Excerpt of the page content stream: q 1 0 0 1 35.761 450.003 cm 0 i 0.75 w 0 0 m 50.923 0 l S Q q 1 0 0 1 86.139 450 cm 0 i 0.75 w

How can I get Images coordinates in pdf into JSONfile?

限于喜欢 提交于 2019-12-08 02:23:39
问题 I have coded creating html page included images extracting a page in pdf document. I had tried to extract images from pdf and then I succeeded to extract images from pdf and to apply the images to html page using PDFBox lib. but I did not extract image coordinates in html page. So searched how to extract image coordinates in pdf, I tried to extract image coordinates in pdf using PDFBox Library. Below code : public static void main(String[] args) throws Exception { try { PDDocument document =

How to use PDFBox to create a link that goes to *previous view*?

空扰寡人 提交于 2019-12-08 01:08:53
问题 By using PDFBox, it is easy to create a link that goes a particular page or page view by using PDPageDestination . For example, the following code will make a link that goes to page 9: PDAnnotationLink link = new PDAnnotationLink(); PDPageDestination destination = new PDPageFitWidthDestination(); PDActionGoTo action = new PDActionGoTo(); destination.setPage(document.getPage(9)); action.setDestination(destination); link.setAction(action); Problem: Instead of going to a particular page, I would

How to get the content of PDF form text fields using pdfbox?

五迷三道 提交于 2019-12-07 19:33:02
问题 I'm using this to get the text of a PDF file using org.apache.pdfbox File f = new File(fileName); if (!f.isFile()) { System.out.println("File " + fileName + " does not exist."); return null; } try { parser = new PDFParser(new FileInputStream(f)); } catch (Exception e) { System.out.println("Unable to open PDF Parser."); return null; } try { parser.parse(); cosDoc = parser.getDocument(); pdfStripper = new PDFTextStripper(); pdDoc = new PDDocument(cosDoc); parsedText = pdfStripper.getText(pdDoc)

Finding javascript code in PDF using Apache PDFBox

删除回忆录丶 提交于 2019-12-07 17:50:44
问题 My goal is to extract and process any JavasSript code that a PDF document might contain. By opening a PDF in editor I can see objects like this: 402 0 obj <</S/JavaScript/JS(\n\r\n /* Set day 25 */\r\n FormRouter_SetCurrentDate\("25"\);\r)>> endobj I am trying to use Apache PDFBox to accomplish this but so far with no luck. This line returns an empty list: jsObj = doc.getObjectsByType(COSName.JAVA_SCRIPT); Can anyone can give me some direction? 回答1: This tool is based on the PrintFields

Detecting text field overflow

半腔热情 提交于 2019-12-07 15:44:13
问题 Assuming I have a PDF document with a text field with some font and size defined, is there a way to determine if some text will fit inside the field rectangle using PDFBox ? I'm trying to avoid cases where text is not fully displayed inside the field, so in case the text overflows given the font and size, I would like to change the font size to Auto (0). 回答1: This code recreates the appearance stream to be sure that it exists so that there is a bbox (which can be a little bit smaller than the

java pdfbox printerjob wrong scaling / page format

徘徊边缘 提交于 2019-12-07 15:20:36
问题 I'm trying to print an existing pdf file with pdfbox. Currently I'm using pdfbox 2.0.0 RC3 through maven. This is my current code: PDDocument document = PDDocument.load(new File(myPdfFile)); PrinterJob job = PrinterJob.getPrinterJob(); if (job.printDialog()) { job.setPageable(new PDFPageable(document)); job.print(); } document.close(); For testing I printed a test pdf with Adobe Acrobat and the same pdf with the few lines of code. Everything works fine except for the borders. All borders

Superscript and subscript differentiation using pdf box

一个人想着一个人 提交于 2019-12-07 11:48:06
问题 I am new to pdfbox Is there any way to differentiate superscript and subscript text from normal text when extracting or after extracting text from pdf using pdfbox library thanks.. 回答1: Check this link if this helps https://svn.apache.org/repos/asf/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/PrintTextLocations.java 回答2: Was able to identify most superscripts by looking for Y and Height changes. Try this: Write your own implementation of PDFTextStripper. Add this to

PDFBox - merge 2 portrait pages onto a single side by side landscape page

左心房为你撑大大i 提交于 2019-12-07 11:19:36
问题 I am trying to write a pdf conversion, that will take a pdf containing 1-up portrait pages, and create a new document, but merge every 2 pages into one 2-up landscape page ie. the following code will scale down the content 50%, but I cant figure out how to make the new page landscape, while injecting the other page as portrait, and injecting into the top left, and right of centre public static void main(String[] args) throws IOException, DocumentException, COSVisitorException { scalePages("c:

How to check a check box in PDF-form using Java PDFBOX api

﹥>﹥吖頭↗ 提交于 2019-12-07 10:58:44
问题 How to check a check box in PDF-form using Java PDFBOX api Initially I tried with the below piece of code but after the execution check box field is invisible in PDF , but it has been checked.. how to avoid such circumstances or they way i have implemented is wrong ? can any one help me out public void check() throws Exception { PDDocument fdeb = null; fdeb = PDDocument.load( "C:\\Users\\34\\Desktop\\complaintform.pdf" ); PDAcroForm form = fdeb.getDocumentCatalog().getAcroForm(); PDField