pdfbox: … is not available in this font's encoding

匿名 (未验证) 提交于 2019-12-03 01:20:02

问题:

I'm having problems with pdfbox 2.0.2 writing a pdf document from elements of a previously read document (https://www.dropbox.com/s/ttxiv0dq3abh5kj/Test.pdf?dl=0). Everything works fine, except when I call showText on a PDPageContentStream where I previously set the font with out.setFont(textState.getFont(), textState.getFontSize()) (see the INFORMATION log) and the font is ComicSansMS or ArialBlack. textState is (a clone from) the state from the previously read document. Writing text with Helvetica or Times-Roman works fine.

INFORMATION: set font PDTrueTypeFont RXNQOL+ComicSansMS,Bold/18.0 embedded     SEVERE: error writing <w>U+0077 is not available in this font's encoding: built-in (TTF) 

I suppose the problem may be caused by a missing hyphen or blank in the font name but have no clue how to fix this.

Here is the complete code

import java.awt.Point; import java.awt.geom.Point2D; import java.io.File; import java.io.IOException; import org.apache.pdfbox.contentstream.PDFGraphicsStreamEngine; import org.apache.pdfbox.cos.COSName; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.pdmodel.PDPageContentStream; import org.apache.pdfbox.pdmodel.font.PDFont; import org.apache.pdfbox.pdmodel.graphics.image.PDImage; import org.apache.pdfbox.pdmodel.graphics.state.PDTextState; import org.apache.pdfbox.util.Matrix; import org.apache.pdfbox.util.Vector;  public class Test extends PDFGraphicsStreamEngine {  public static void main(String[] args) throws IOException {     test(); }  public static void test() throws IOException {     PDDocument document = PDDocument.load(new File("Test.pdf"));     PDPage pageIn = document.getPage(0);     PDDocument saveDoc = new PDDocument();     PDPage savePage = new PDPage(pageIn.getMediaBox());     saveDoc.addPage(savePage);     try (PDPageContentStream out = new PDPageContentStream(saveDoc, savePage)) {         Test test = new Test(pageIn, out);         test.processPage(pageIn);     } }  private final PDPageContentStream out;  public Test(PDPage pageIn, PDPageContentStream out) {     super(pageIn);     this.out = out; }  @Override public void appendRectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3) throws IOException { }  @Override public void clip(int windingRule) throws IOException { }  @Override public void closePath() throws IOException { }  @Override public void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException { }  @Override public void drawImage(PDImage pdImage) throws IOException { }  @Override public void endPath() throws IOException { }  @Override public void fillAndStrokePath(int windingRule) throws IOException { }  @Override public void fillPath(int windingRule) throws IOException { }  @Override public Point2D getCurrentPoint() {     return new Point(0, 0); }  @Override public void lineTo(float x, float y) throws IOException { }  @Override public void moveTo(float x, float y) throws IOException { }  @Override public void shadingFill(COSName shadingName) throws IOException { }  @Override protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement) throws IOException {     super.showGlyph(textRenderingMatrix, font, code, unicode, displacement);     PDTextState textState = getGraphicsState().getTextState();     out.beginText();     out.setTextMatrix(getTextMatrix());     out.setFont(textState.getFont(), textState.getFontSize());     out.showText(unicode);     out.endText(); }  @Override public void strokePath() throws IOException { }  } 

Any suggestions?

Thanks, Juergen

回答1:

tl;dr: That font doesn't support encoding.

The cause of the problem is that your Comic Sans subsetted font does have a "post" (postscript) table, but that its glyphNames table is null. I.e. your font does not have glyph names. For A-Z, a-z the names are like these characters; for "(" the glyph name is "parenleft". Because these names are missing, PDFBox creates pseudo names from the glyph ID like "90" (instead of "w") for "w" in the second part of PDTrueType.readEncodingFromFont().

However when encoding, PDFBox uses the Adobe Glyphlist, as the font does not have an encoding entry. If you look with PDFDebugger at the other fonts, e.g. R18, you'll find "Encoding: WinAnsiEncoding":

What you are apparently doing is to create a new page with text only. A different way to do this is to analyse the content streams and simply remove all tokens that paint stuff different than text. To start with that, have a look at the RemoveAllText example in the source code download, and download the PDF 32000 specification, and look at the part "operators summary" and be careful what you delete. For example "Do" is used both to draw images and to draw XObject forms, which are also content streams.

See here: How can I remove all images/drawings from a PDF file and leave text only in Java?

Both solutions are wrong, the first one just pulls all images from under the feet, the second one is a good start but does not take care to check whether the parameter is an image or not.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!