pdfbox

[PDFBox]后台操作pdf的工具类

匿名 (未验证) 提交于 2019-12-02 23:05:13
  PDFBox是Apache下的一个操作pdf的类库。其也提供了一个命令行的工具,也提供了java调用的第三方类库。   下载地址: https://pdfbox.apache.org/      下面的实验基于JDK8+pdfbox-2.0.13.jar+pdfbox-app-2.0.13.jar(命令行工具库) 1.命令行使用 https://pdfbox.apache.org/2.0/commandline.html   命令行工具可以提取pdf中的图片、文本,合并pdf与拆分pdf,pdf转换为图片等操作。 1.提取图片 java -jar pdfbox-app-2.0.13.jar ExtractImages ./1.pdf   会在同文件夹下提取出pdf中的图片。 2.提取文字 java -jar pdfbox-app-2.0.13.jar ExtractText ./1.pdf ./text.txt   当然还可以指定起始页号等参数。 3.pdf转换为图片 java -jar pdfbox-app-2.0.13.jar PDFToImage ./1.pdf   还有许多命令行操作可以参考官网的文档,对于参数都有详细的解释。 这种方式可以封装为工具类用Runtime多线程执行操作pdf。 2.Java中作为类库使用 https://www.cnblogs.com

Reading a table or cell value in a pdf file using java?

点点圈 提交于 2019-12-02 20:17:50
问题 I have gone through Java and PDF forums to extract a text value from the table in a pdf file, but could't find any solution except JPedal (It's not opensource and licensed). So, I would like to know any opensource API's like pdfbox, itext to achieve the same result as JPedal. Ref. Example: 回答1: In comments the OP clarified that he locates the text value from the table in a pdf file he wants to extract By providing X and Y co-ordinates Thus, while the question initially sounded like generic

Remove PDFont caching with Apache tika

五迷三道 提交于 2019-12-02 20:15:33
问题 I am trying to extract text only from a number of different coduments (rtf doc pdf). I naturally turned to Apache Tika because it can autodetect the document and extract text accordingly. I am only interested in the text and not formatting etc. My application ends up with a big memory leak and on investigating it, this is coming from caching from PDFFont class from the PDFBox dependency. I am not interesting in caching Fontmetrics and other Font formatting issues from pdfs as I want to only

Not able to extract images from PDFA1-a format document

拈花ヽ惹草 提交于 2019-12-02 18:53:24
问题 I am using the following code for extracting images from pdf which is in PDFA1-a format but I am not able to get the images . List<PDPage> list = document.getDocumentCatalog().getAllPages(); String fileName = oldFile.getName().replace(".pdf", "_cover"); int totalImages = 1; for (PDPage page : list) { PDResources pdResources = page.findResources(); Map pageImages = pdResources.getImages(); if (pageImages != null) { InputStream xmlInputStream = null; Iterator imageIter = pageImages.keySet()

How to set the text of a PDTextbox to a color?

北战南征 提交于 2019-12-02 16:46:25
问题 I would like a PDTextbox to have Red text. I'm able to write out Red text, and I can set the value of a textbox, but I'm not sure how to set the textbox content to the color Red. ie. if (field instanceof PDTextbox) { field.setValue(field.getPartialName()); //SOME WAY TO SET COLOR HERE? Here is the test code I'm using: package com.circumail; import java.awt.Color; import java.io.File; import java.io.IOException; import java.util.List; import org.apache.fontbox.util.BoundingBox; import org

Why doesn'n create pdf-documents in java servlet? [duplicate]

别来无恙 提交于 2019-12-02 13:36:55
This question already has an answer here: How can I serve a PDF to a browser without storing a file on the server side? 4 answers I use iText/Pdfbox to create a PDF document. Everything works when I create the PDF using a standalone Java class like this: public static void main(String[] args){ ... ... ... } The document is created correctly. But I need create a PDF document from a Servlet. I paste the code into the get or post method, run that servlet on the server, but the PDF document isn't created! This code works as a standalone application: This code doesn't work: Bruno Lowagie Please

Setting image form field

☆樱花仙子☆ 提交于 2019-12-02 13:35:03
问题 I created a sample PDF form with one image field. I'm trying to set an image to the field using PDFBox. I see that PDFBox treats such field as an instance of PDPushButton but I don't see this class' interface exposes methods to deal with images... The sample PDF can be downloaded using the URL in comment. How can it be done? EDIT : Here is what I'm doing so far: PDDocument pdfDocument = null; PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm(); if (acroForm != null) {

Text extraction is empty and unknown for text has type3 font using PDFBox,iText (difficult topic!)

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-02 12:48:26
I have PDF file in Arabic that has text with font Type3 when I extract text using PDFBox some characters are empty and their font equals null? I want to know what is the problem? code: protected void processTextPosition(TextPosition text) { String character=text.getCharacter(); // is empty String font=text.getFont().getBaseFont(); // equal null } stream produced with iText: ( dJ� v{d W�cG�)Tj I speak about these question marks, why do I get the characters in this format? These question marks appeared in my stream as "SOH-STX-ETX-EOT" , not one character. The character inside PDF is shown as 'd

PDFBox Button execute javascript to close document

孤人 提交于 2019-12-02 12:44:57
My use case is to have a button like so on a pdf page (really to add them to existing pages but for now I just want to see it work on anything). ---------- - Back - ---------- And what it does is just closes the current pdf page. The idea is to have multiple tabs opened and each tab is a pdf and then when you hit the "Back" button it closes the current pdf which will then focus back to the previous pdf. This is what I have been trying to use so far. // Create a new empty document try { PDDocument document = new PDDocument(); // Create a new blank page and add it to the document PDPage

PDFBox Inconsistent PDTextField Autosize Behavior after setValue

我与影子孤独终老i 提交于 2019-12-02 12:30:01
I am using Apache PDFBox for configuration of PDTextField 's on a PDF document where I load Lato onto the document using: font = PDType0Font.load( @j_pd_document, java.io.FileInputStream.new('/path/to/Lato-Regular.ttf') ) # => Lato-Regular font_name = pd_default_resources.add(font).get_name # => F4 I then pass the font_name into a default_appearance_string for the PDTextField like so: j_text_field.set_default_appearance("/#{font_name} 0 Tf 0 g") # where font_name is # passed in from above The issue now occurs when I proceed to invoke setValue on the PDTextField . Because I set the font_size in