PDFBox scrambling the text

拜拜、爱过 提交于 2021-01-28 05:21:36

问题


I have been trying to edit a PDF document to pre-fill form entries. I've got it working (sort of). The text I'm adding, goes in fine. However, other text that was already there seems to have gotten replaced with "&%£!£! symbols. I've worked out that it's something to do with the "contentStream" section in the code below. It seems to be the "setFont" line. If I remove it, the page remains OK... except that the "Hello Richard" text is no longer displayed!

Help please!

package pdfboxtest;

import java.awt.Color;
import java.util.List;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;

public class PDFFormFiller {

    private static final String R40_NEW_FORM_PATH = "c:\\temp\\hmrc-r40.pdf";
    private static final String R40_COMPLETED_FORM_PATH = "c:\\temp\\hmrc-r40-complete.pdf";

    public static void main(String[] args) throws Exception {
        PDDocument doc = PDDocument.load(R40_NEW_FORM_PATH);

        addTextToPage(doc);

        doc.save(R40_COMPLETED_FORM_PATH);
        doc.close();
    }

    private static void addTextToPage(PDDocument doc) throws Exception {
        List pages = doc.getDocumentCatalog().getAllPages();
        PDPage firstPage = (PDPage) pages.get(0);
        PDPageContentStream contentStream = new PDPageContentStream(doc, firstPage, true, true);

        contentStream.setFont(PDType1Font.HELVETICA_BOLD, 24);
        contentStream.beginText();
        contentStream.setNonStrokingColor(Color.BLACK);
        contentStream.moveTextPositionByAmount(100, 200);
        contentStream.drawString("HELLO RICHARD!!");
        contentStream.endText();
        contentStream.close();

    }
}

This is the top of the form before I add text elsewhereAnd after I've added text elsewhere, this bit of text goes nuts! I did not edit this bit though


回答1:


As already assumed in a comment, this is due to a PDFBox issue I described a workaround for in this answer. This issue is still present in the version 1.8.2 of PDFBox but meanwhile has been fixed for versions 1.8.3 and 2.0.0, cf. PDFBOX-1753.

In your case the workaround changes the addTextToPage method like this:

private static void addTextToPage(PDDocument doc) throws IOException {
    List pages = doc.getDocumentCatalog().getAllPages();
    PDPage firstPage = (PDPage) pages.get(0);
    PDPageContentStream contentStream = new PDPageContentStream(doc, firstPage, true, true);

    firstPage.getResources().getFonts(); // <<<<<<

    contentStream.setFont(PDType1Font.HELVETICA_BOLD, 24);
    contentStream.beginText();
    contentStream.setNonStrokingColor(Color.BLACK);
    contentStream.moveTextPositionByAmount(100, 200);
    contentStream.drawString("HELLO RICHARD!!");
    contentStream.endText();
    contentStream.close();
}

The added line enforces an initialization which new PDPageContentStream forgets but setFont counts on having been done. You can find details in the answer referenced above. You might want to inform PDFBox development.



来源:https://stackoverflow.com/questions/19702671/pdfbox-scrambling-the-text

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!