PDFBox outputs question marks instead of some Japanese characters

谁说我不能喝 提交于 2019-12-10 21:22:00

问题


From almost all pdf files written in Japanese, I got correct text with Apache Tika(1.7) and Apache PDFBox(1.8.8). Now I have a trouble with a pdf file which i cannot upload it here by business reason.

problem

All Japanese characters in a paragraph becomes "?", but in other paragraphs, Japanese characters are correct. in any case, ASCII chars are correct.

PDF file

All Japanese characters in the PDF document are seems to be correct in Adobe Acrobat on my Windows7 desktop. from Adobe Acrobat properties dialog, the PDF document has several Japanese font information. i don't know who/how made this file.

  • MS-Mincho Type:TrueType(CID) <- several
  • HeiseiMin-W3 Type:Type 1(CID) Encoding:UniJIS-UCS2-HW-H Actual Font:KozMinPr6N-Regular Actual Font Type:Type 1(CID)
  • MSMincho Type:TrueType(CID) Encoding:UniJIS-UCS2-H Actual Font:MS明朝 Actual Font Type:TrueType

PDF Converter:Acrobat Distiller 7.0(Windows) PDF Version:1.6(Acrobat 7.x)

foundings

"?"s are made in PDFStreamEngine (line 492) caused by lookup failure in PDType0Font(line 202). cmapName of cmap(of PDFont class) in this situation is "UniJIS-UCS2-HW-H". looking at CMap implementation carefully, isInCodeSpaceRanges method returns true when it should be true. finally, because char2CIDMappings has no entry and range.map fails In CMap(around line 174), lookupCID fails. An argument char[] has values such as [48, -120, 48, -118, ...] seems to be correct code points in Unicode for me...

is there any workaround? thanks.


回答1:


I solved font issues (chinese, japanese, korean and any other) in pdfbox by turning text into image like this

void writeLine(String text, int x, int y, int width, int height,
           Font font, Color color, PDPageContentStream contentStream, PDDocument document) throws IOException {

    try (
    ByteArrayOutputStream baos = new ByteArrayOutputStream()
    ) {
    int scale = 2;
    BufferedImage img = new BufferedImage(width * scale, height * scale, BufferedImage.TYPE_INT_ARGB);
    Graphics2D g2d = img.createGraphics();
    g2d.setRenderingHint(RenderingHints.KEY_ALPHA_INTERPOLATION, RenderingHints.VALUE_ALPHA_INTERPOLATION_QUALITY);
    g2d.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);
    g2d.setRenderingHint(RenderingHints.KEY_TEXT_ANTIALIASING, RenderingHints.VALUE_TEXT_ANTIALIAS_ON);
    g2d.setRenderingHint(RenderingHints.KEY_COLOR_RENDERING, RenderingHints.VALUE_COLOR_RENDER_QUALITY);
    g2d.setRenderingHint(RenderingHints.KEY_DITHERING, RenderingHints.VALUE_DITHER_ENABLE);
    g2d.setRenderingHint(RenderingHints.KEY_FRACTIONALMETRICS, RenderingHints.VALUE_FRACTIONALMETRICS_ON);
    g2d.setRenderingHint(RenderingHints.KEY_INTERPOLATION, RenderingHints.VALUE_INTERPOLATION_BILINEAR);
    g2d.setRenderingHint(RenderingHints.KEY_RENDERING, RenderingHints.VALUE_RENDER_SPEED);
    g2d.setRenderingHint(RenderingHints.KEY_STROKE_CONTROL, RenderingHints.VALUE_STROKE_PURE);
    g2d.setFont(font);
    g2d.setColor(color);
    g2d.scale(scale,scale);
    g2d.drawString(text, 0, g2d.getFontMetrics().getAscent());
    g2d.dispose();

    ImageIO.write(img, "png", baos);
    baos.flush();
    baos.close();

    contentStream.drawImage(PDImageXObject.createFromByteArray(
        document,baos.toByteArray(), ""), x, y, width, height);
    }
}


来源:https://stackoverflow.com/questions/29203976/pdfbox-outputs-question-marks-instead-of-some-japanese-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!