pdfbox

Using PDFBox to write unicode strings to a PDF

你离开我真会死。 提交于 2019-12-01 20:47:29
问题 I want to use Apache PDFBox 1.8.8 to create a PDF that contains unicode characters but I'm confused about what is supported and what isn't. An answer posted here suggests it is a bug that has been fixed on the trunk. Another answer posted here suggests that I have to do the translation myself. And another (older) answer posted here talks about embedding fonts. Please can someone clarify. Also, if it was a bug that is now fixed, can someone tell me when the next release of PDFBox is likely to

Using PDFBox to write unicode strings to a PDF

醉酒当歌 提交于 2019-12-01 18:38:05
I want to use Apache PDFBox 1.8.8 to create a PDF that contains unicode characters but I'm confused about what is supported and what isn't. An answer posted here suggests it is a bug that has been fixed on the trunk. Another answer posted here suggests that I have to do the translation myself. And another (older) answer posted here talks about embedding fonts. Please can someone clarify. Also, if it was a bug that is now fixed, can someone tell me when the next release of PDFBox is likely to be. Thanks. mkl Essentially all the answers you linked to are correct. You have to keep in mind which

.NoClassDefFoundError when trying to use pdfBox

女生的网名这么多〃 提交于 2019-12-01 17:46:51
when I try to use one of the PDFBox examples for extracting images, in the run time,it gives me the following exception: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory at org.apache.pdfbox.pdfparser.BaseParser.<clinit>(BaseParser.java:68) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1218) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1186) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1111) at pdfboxtest.PdfBoxTest.extractImage(PdfBoxTest.java:69) at pdfboxtest.PdfBoxTest.main(PdfBoxTest.java:53)

PDFBOX : U+000A ('controlLF') is not available in this font Helvetica encoding: WinAnsiEncoding

独自空忆成欢 提交于 2019-12-01 17:33:11
When trying to print a PDF page using Java and the org.apache.pdfbox library, I get this error: PDFBOX : U+000A ('controlLF') is not available in this font Helvetica encoding: WinAnsiEncoding razvan [PROBLEM] The String you are trying to display contains a newline character. [SOLUTION] Replace the String with a new one and remove the newline: text = text.replace("\n", "").replace("\r", ""); if you are trying to set a new line using "\n" in a string . you can try PDPageContentStream.newLineAtOffset(x,y) to add a new line PDFont font = PDType1Font.HELVETICA ; PDDocument doc = new PDDocument();

Unable to save Arabic words in a PDF - PDFBox Java

社会主义新天地 提交于 2019-12-01 17:13:03
Trying to save Arabic words in an editable PDF. It works all fine with English ones but when I use Arabic words, I am getting this exception: java.lang.IllegalArgumentException: U+0627 is not available in this font Helvetica encoding: WinAnsiEncoding Here is how I generated PDF: public static void main(String[] args) throws IOException { String formTemplate = "myFormPdf.pdf"; try (PDDocument pdfDocument = PDDocument.load(new File(formTemplate))) { PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm(); if (acroForm != null) { PDTextField field = (PDTextField) acroForm.getField(

Unable to save Arabic words in a PDF - PDFBox Java

那年仲夏 提交于 2019-12-01 17:00:47
问题 Trying to save Arabic words in an editable PDF. It works all fine with English ones but when I use Arabic words, I am getting this exception: java.lang.IllegalArgumentException: U+0627 is not available in this font Helvetica encoding: WinAnsiEncoding Here is how I generated PDF: public static void main(String[] args) throws IOException { String formTemplate = "myFormPdf.pdf"; try (PDDocument pdfDocument = PDDocument.load(new File(formTemplate))) { PDAcroForm acroForm = pdfDocument

PDFBOX, Reading a pdf line by line and extracting text properties

耗尽温柔 提交于 2019-12-01 14:40:38
I am using pdfbox to extract text from pdf files. I read the pdf document as follows PDFParser parser = null; String text = ""; PDFTextStripper stripper = null; PDDocument pdoc = null; COSDocument cdoc = null; File file = new File("path"); try { parser = new PDFParser(new FileInputStream(file)); } catch (IOException e) { e.printStackTrace(); } try { parser.parse(); cdoc = parser.getDocument(); stripper = new PDFTextStripper(); pdoc = new PDDocument(cdoc); stripper.setStartPage(1); stripper.setEndPage(2); text = stripper.getText(pdoc); System.out.println(text); } catch (IOException e) { e

PDFBox 2.0.7 ExtractText not working but 1.8.13 does and PDFReader as well

巧了我就是萌 提交于 2019-12-01 14:07:11
hopefully you have an idea of what is going wrong with extracting a text from PDF using pdfbox 2.0.7. The result is very strange: Using 1.8.13, the command java -jar pdfbox-app-1.8.13.jar ExtractText -sort -nonSeq test.pdf leads to Deutsche Bank Privat- und Geschäftskunden AG Bruttoertrag 43,80 USD 37,15 EUR Kapitalertragsteuer (KESt) - 5,36 USD - 4,55 EUR Solidaritätszuschlag auf KESt - 0,29 USD - 0,25 EUR Umrechnungskurs USD zu EUR 1,1791000000 Gutschrift mit Wert 15.08.2017 32,35 EUR Using 2.0.7, the command java -jar pdfbox-app-2.0.7.jar ExtractText -sort test.pdf leads to aeutsche Bank

How do I make modifications to existing layer(Optional Content Group) in pdf?

断了今生、忘了曾经 提交于 2019-12-01 13:50:51
I am implementing functionality to allow user to draw figures in pdf. I want to draw all the figures in a single layer, which can be made visible or invisible by the user.I am able to create a new layer in a pdf. I am also able to retrieve that layer.But, I am not able to make modification to layer (PDOptionalContentGroup). I tried converting the PDOptionalContentGroup to PDPage and then making desired changes to PDPPage. I also saved the PDDocument.It only created another layer with the same name as previous one, but the changes were not there.Here is the code that I used: PDFont font =

PDFBOX, Reading a pdf line by line and extracting text properties

廉价感情. 提交于 2019-12-01 13:23:16
问题 I am using pdfbox to extract text from pdf files. I read the pdf document as follows PDFParser parser = null; String text = ""; PDFTextStripper stripper = null; PDDocument pdoc = null; COSDocument cdoc = null; File file = new File("path"); try { parser = new PDFParser(new FileInputStream(file)); } catch (IOException e) { e.printStackTrace(); } try { parser.parse(); cdoc = parser.getDocument(); stripper = new PDFTextStripper(); pdoc = new PDDocument(cdoc); stripper.setStartPage(1); stripper