pdfbox

PDFBOX : U+000A ('controlLF') is not available in this font Helvetica encoding: WinAnsiEncoding

馋奶兔 提交于 2019-12-19 06:19:21
问题 When trying to print a PDF page using Java and the org.apache.pdfbox library, I get this error: PDFBOX : U+000A ('controlLF') is not available in this font Helvetica encoding: WinAnsiEncoding 回答1: [PROBLEM] The String you are trying to display contains a newline character. [SOLUTION] Replace the String with a new one and remove the newline: text = text.replace("\n", "").replace("\r", ""); 回答2: The answer selected for this post works, replacing all instances of \n and \r from your string, if

Memory Leak Issue With PDFBox

爷,独闯天下 提交于 2019-12-19 04:53:12
问题 I am using PDF Box version 2.0.9 in my application. I have to parse large pdf files from web. Following is the code I am using MimeDetector Class @Getter @Setter class MimeTypeDetector { private ByteArrayInputStream byteArrayInputStream; private BodyContentHandler bodyContentHandler; private Metadata metadata; private ParseContext parseContext; private Detector detector; private TikaInputStream tikaInputStream; MimeTypeDetector(ByteArrayInputStream byteArrayInputStream) { this

Write arabic characters with PDFBOX [duplicate]

淺唱寂寞╮ 提交于 2019-12-19 03:36:44
问题 This question already has an answer here : Writing Arabic with PDFBOX with correct characters presentation form without being separated (1 answer) Closed last year . Update 1 I'm trying to write some Arabic characters in a pdf document using pdfbox. As a result I get some strange characters. You can find below the code snippet I used for my test. Notice that the same code was used to print Latin characters without any problem. public static void main(String[] args) throws Exception {

Apache PDFBox Remove Spaces between characters

青春壹個敷衍的年華 提交于 2019-12-19 03:14:11
问题 We are using PDFBox to extract text from PDF's. Some PDF's text can't be extract correctly. The following image shows a part from the PDF as image: After text extraction we get the following text: 3, 8 5 EU R 1 Netto 38,50 EUR 4,00 (Spaces are added between ',' and '8') Here is our code: PDDocument pdf = PDDocument.load(reuseableInputStream); PDFTextStripper pdfStripper = new PDFTextStripper(); pdfStripper.setSortByPosition(true); String text = pdfStripper.getText(pdf); We tried to play with

Apache PDFBox Remove Spaces between characters

浪子不回头ぞ 提交于 2019-12-19 03:14:02
问题 We are using PDFBox to extract text from PDF's. Some PDF's text can't be extract correctly. The following image shows a part from the PDF as image: After text extraction we get the following text: 3, 8 5 EU R 1 Netto 38,50 EUR 4,00 (Spaces are added between ',' and '8') Here is our code: PDDocument pdf = PDDocument.load(reuseableInputStream); PDFTextStripper pdfStripper = new PDFTextStripper(); pdfStripper.setSortByPosition(true); String text = pdfStripper.getText(pdf); We tried to play with

how to add timestamp without Digital Signature

自作多情 提交于 2019-12-18 18:31:16
问题 I want to add Time Stamp to my PDF document (without Digital Signature). How can I do this? I can do it with Digital signature using Itext ( I have here TSAClient): MakeSignature.signDetached(appearance, digest, signature, chain, null, null, tsa, 0, subfilter); but how to do similar thing without digital signature? using Bouncy Castle or Itext or Pdfbox... or with another library.. 回答1: In iText you are looking for LtvTimestamp.timestamp(appearance, tsa, signatureName); Also cf. the JavaDoc

PDF table extraction

坚强是说给别人听的谎言 提交于 2019-12-18 15:28:16
问题 I have (same) data saved as a GIF image file and as a PDF file and I want to parse it to HTML or XML. The data is actually the menu for my university's cafeteria. That means that there is a new version of the file that has to be parsed each week! In General, the files contain some header and footer text, as well as a table full of other data in between. I have read some posts on stackoverflow and I also had started some attempts to parse out the table data as HTML/XML: PDF PDFBox || iText

PDFBox - Removing invisible text (by clip/filling paths issue)

无人久伴 提交于 2019-12-18 09:41:41
问题 Link to example PDF: click here. Here you can see that many labels in the left are clipped (because of some clipping instructions) When I use PDFTextStripper, it prints all text which is actually cut/hidden in example PDF file. I have already tried solution described here however it makes it even worth because removes much text in the top + some text in the beginning of each row. Is there any other way to show only visible characters, and skip all overlapped, using PDFBox? Or maybe is there

how to add background image to PDF using PDFBox?

落花浮王杯 提交于 2019-12-18 09:32:04
问题 I am using Java PDFBox version 2.0. I want to know how to add a back ground image to the pdf. I can not find any good example in the pdfbox.apache.org 回答1: Do this with each page, i.e. from 0 to doc.getNumberOfPages(): PDPage pdPage = doc.getPage(page); InputStream oldContentStream = pdPage.getContents(); byte[] ba = IOUtils.toByteArray(oldContentStream); oldContentStream.close(); // brings a warning because a content stream already exists PDPageContentStream newContentStream = new

Dynamically resize jframe/image or scroll

时间秒杀一切 提交于 2019-12-18 09:12:24
问题 As discussed in this question (Wrap image to Jframe), i need a jframe to match the exact provided image (The image itself is originally a PDF which has been converted to an image) The solution provided does indeed build a jframe to my image dimensions, but i can't actually see all of the image. I need to be able to resize the jframe, with the image dynamically adjusting to the new jframe size. Failing that, i think if i could just scroll the jframe or even zoom in or out, i could at least get