Using PDFBox to write UTF-8 encoded strings to a PDF [duplicate]

只愿长相守 提交于 2019-12-17 05:07:41

问题


I am having trouble writing unicode characters out to a PDF using PDFBox. Here is some sample code that generates garbage characters instead of outputting "š". What can I add to get support for UTF-8 strings?

PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page);

PDType1Font font = PDType1Font.HELVETICA;
contentStream.setFont(font, 12);
contentStream.beginText();
contentStream.moveTextPositionByAmount(100, 400);
contentStream.drawString("š");
contentStream.endText();
contentStream.close();
document.save("test.pdf");
document.close();

回答1:


You are using one of the inbuilt 'Base 14' fonts that are supplied with Adobe Reader. These fonts are not Unicode; they are effectively a standard Latin alphabet, though with a couple of extra characters. It looks like the character you mention, a lowercase s with a caron (š), is not available in PDF Latin text... though an uppercase Š is available but curiously on Windows only. See Appendix D of the PDF specification at http://www.adobe.com/devnet/pdf/pdf_reference.html for details.

Anyway, getting to the point... you need to embed a Unicode font if you want to use Unicode characters. Make sure you are licensed to embed whatever font you decide on... I can recommend the open-source Gentium or Doulos fonts because they're free, high quality and have comprehensive Unicode support.



来源:https://stackoverflow.com/questions/5425251/using-pdfbox-to-write-utf-8-encoded-strings-to-a-pdf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!