How to generate a valid PDF/A file using iText and XMLWorker (HTML to PDF/A process)

感情迁移 提交于 2019-12-31 00:45:35

问题


I'm currently developing a method that will accept HTML input and convert it into a valid PDF/A file. I know how to programmatically construct a valid PDF/A file using iText (reference: http://itextsupport.com/download/pdfa3.html) but I'm unable to generate a valid PDF/A file using HTML as input and using XMLWorker to transform this input into a PDF file. The problem that I have right now is due to the embedded fonts requirement of the PDF/A format. I always get this exception:

Exception in thread "main" com.itextpdf.text.pdf.PdfAConformanceException: All the fonts must be embedded. This one isn't: Helvetica

I try to force which fonts will the HTML input use via a CSS file and I register the fonts I want to use in the output PDF file via the XMLWorkerFontProvider class, but it seems I'm doing something wrong because the exception commented above is always thrown.

What else do I need in order to XMLWorker uses the fonts registered via XMLWorkerFontProvider class? I want to avoid the use of the default font Helvetica in every HTML element present in the input.

Below is the code I'm using for testing:

style.css (just 1 line):

* { font: normal 100% Arial, sans-serif !important; }

Main.java:

package com.itextpdf;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.io.Reader;
import java.io.StringReader;

import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.ICC_Profile;
import com.itextpdf.text.pdf.PdfAConformanceLevel;
import com.itextpdf.text.pdf.PdfAWriter;
import com.itextpdf.tool.xml.XMLWorker;
import com.itextpdf.tool.xml.XMLWorkerFontProvider;
import com.itextpdf.tool.xml.XMLWorkerHelper;
import com.itextpdf.tool.xml.css.CssFile;
import com.itextpdf.tool.xml.css.StyleAttrCSSResolver;
import com.itextpdf.tool.xml.html.CssAppliers;
import com.itextpdf.tool.xml.html.CssAppliersImpl;
import com.itextpdf.tool.xml.html.Tags;
import com.itextpdf.tool.xml.parser.XMLParser;
import com.itextpdf.tool.xml.pipeline.css.CSSResolver;
import com.itextpdf.tool.xml.pipeline.css.CssResolverPipeline;
import com.itextpdf.tool.xml.pipeline.end.PdfWriterPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipelineContext;

public class Main {

    /**
     * @param args
     */
    public static void main(String[] args) {

        StringBuffer buf = new StringBuffer();

        buf.append("<!DOCTYPE html>");
        buf.append("<html>");
        buf.append("<head>");
        buf.append("<title>Test</title>");
        buf.append("</head>");
        buf.append("<body>");
        buf.append("<p>This is a test</p>");
        buf.append("</body>");
        buf.append("</html>");

        OutputStream file = null;
        Document document = null;
        PdfAWriter writer = null;

        try {

            file = new FileOutputStream(new File("C:\\Users\\amartin\\Desktop\\Test.pdf"));
            document = new Document();
            writer = PdfAWriter.getInstance(document, file, PdfAConformanceLevel.PDF_A_1B);

            // Create XMP metadata. It's a PDF/A requirement.
            writer.createXmpMetadata();

            document.open();

            // Set output intent. PDF/A requirement.
            ICC_Profile icc = ICC_Profile.getInstance(new FileInputStream("./src/main/resources/com/itextpdf/sRGB Color Space Profile.icm"));
            writer.setOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);

            // CSS
            CSSResolver cssResolver = new StyleAttrCSSResolver();
            CssFile cssFile = XMLWorkerHelper.getCSS(new FileInputStream("./css/style.css"));
            cssResolver.addCss(cssFile);

            XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider();
            fontProvider.register("./fonts/arial.ttf");
            fontProvider.register("./fonts/sans-serif.ttf");
            fontProvider.addFontSubstitute("lowagie", "garamond");

            CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
            HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
            htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());

            // Pipelines
            PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
            HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
            CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

            XMLWorker worker = new XMLWorker(css, true);
            XMLParser p = new XMLParser(worker);

            Reader reader = new StringReader(buf.toString());
            p.parse(reader);

        } catch (Exception e) {

            e.printStackTrace();

        } finally {

            if (document != null && document.isOpen())
                document.close();

            try {

                if (file != null)
                    file.close();

            } catch (IOException e) {}

            if (writer != null && !writer.isCloseStream())
                writer.close();

        }

    }

}

edit:

Answering to Bruno, I have extended the FontFactoryImp class overriding the getFont() method (the one that has all the arguments). It calls the the System.out.println function like this:

System.out.println("=fontname: " + fontname + " =encoding: " + encoding + " =embedded : " + embedded + " =size: " + size + " =style: " + style + " =BaseColor: " + color)

and then calls parent.getFont() method with the same arguments. The only output I see is this:

=fontname: null =encoding: Cp1252 =embedded : true =size: -1.0 =style: -1 =BaseColor: null =fontname: null =encoding: Cp1252 =embedded : true =size: -1.0 =style: -1 =BaseColor: null

and the exception thrown, pasted before this code.


回答1:


Based on the feedback you're sending to the System.out, it seems that XML Worker doesn't pick up the font family you want to use.

Please specify the font family like this:

font-family: "Arial"

Using 'font' in CSS may work, but it's tricky. I think iText sees normal and interprets it as Use the default font.




回答2:


The complete code that makes this example work is the following:

style.css:

* {
    font-family: "Arial";
    font-style: normal;
}

Main.java:

package com.itextpdf;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.io.Reader;
import java.io.StringReader;

import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.ICC_Profile;
import com.itextpdf.text.pdf.PdfAConformanceLevel;
import com.itextpdf.text.pdf.PdfAWriter;
import com.itextpdf.tool.xml.XMLWorker;
import com.itextpdf.tool.xml.XMLWorkerHelper;
import com.itextpdf.tool.xml.css.CssFile;
import com.itextpdf.tool.xml.css.StyleAttrCSSResolver;
import com.itextpdf.tool.xml.html.CssAppliers;
import com.itextpdf.tool.xml.html.CssAppliersImpl;
import com.itextpdf.tool.xml.html.Tags;
import com.itextpdf.tool.xml.parser.XMLParser;
import com.itextpdf.tool.xml.pipeline.css.CSSResolver;
import com.itextpdf.tool.xml.pipeline.css.CssResolverPipeline;
import com.itextpdf.tool.xml.pipeline.end.PdfWriterPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipelineContext;

public class Main {

    public static void main(String[] args) {

        StringBuffer buf = new StringBuffer();

        String title = "Test";

        // Sample HTML content.
        buf.append("<!DOCTYPE html>");
        buf.append("<html>");
        buf.append("<head>");
        buf.append("<title>" + title + "</title>");
        buf.append("</head>");
        buf.append("<body>");
        buf.append("<p>This is a test</p>");
        buf.append("</body>");
        buf.append("</html>");

        OutputStream file = null;
        Document document = null;
        PdfAWriter writer = null;

        try {

            file = new FileOutputStream(new File("C:\\Users\\amartin\\Desktop\\Test.pdf"));
            document = new Document();
            writer = PdfAWriter.getInstance(document, file, PdfAConformanceLevel.PDF_A_1B);

            // Avoid discrepances between document title and XMP metadata information.
            document.addTitle(title);

            // Create XMP metadata. It's a PDF/A requirement.
            writer.createXmpMetadata();

            document.open();

            // Set output intent. PDF/A requirement.
            ICC_Profile icc = ICC_Profile.getInstance(new FileInputStream("./src/main/resources/com/itextpdf/sRGB Color Space Profile.icm"));
            writer.setOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);

            // CSS stylesheet.
            CSSResolver cssResolver = new StyleAttrCSSResolver();
            CssFile cssFile = XMLWorkerHelper.getCSS(new FileInputStream("./css/style.css"));
            cssResolver.addCss(cssFile);

            MyFontProvider fontProvider = new MyFontProvider();
            fontProvider.register("./fonts/arial.ttf");

            /* DEBUG
            System.out.println("Fonts present in " + fontProvider.getClass().getName());
            Set<String> registeredFonts = fontProvider.getRegisteredFonts();
            for (String font : registeredFonts)
                System.out.println(font);
            */

            CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
            HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
            htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());

            // Pipelines.
            PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
            HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
            CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

            XMLWorker worker = new XMLWorker(css, true);
            XMLParser p = new XMLParser(worker);

            Reader reader = new StringReader(buf.toString());
            p.parse(reader);

        } catch (Exception e) {

            e.printStackTrace();

        } finally {

            if (document != null && document.isOpen())
                document.close();

            try {

                if (file != null)
                    file.close();

            } catch (IOException e) {}

            if (writer != null && !writer.isCloseStream())
                writer.close();

        }

    }

}

MyFontProvider.java:

package com.itextpdf;

import com.itextpdf.text.BaseColor;
import com.itextpdf.text.Font;
import com.itextpdf.text.FontFactoryImp;

public class MyFontProvider extends FontFactoryImp {

    @Override
    public Font getFont(String fontname, String encoding, boolean embedded,
            float size, int style, BaseColor color) {

        System.out.println("=fontname: " + fontname + " =encoding: " + encoding + " =embedded : " + embedded + " =size: " + size + " =style: " + style + " =BaseColor: " + color);

        return super.getFont(fontname, encoding, embedded, size, style, color);

    }

}

Again, thank you, Bruno. I'm really glad to get your help here :)



来源:https://stackoverflow.com/questions/25604008/how-to-generate-a-valid-pdf-a-file-using-itext-and-xmlworker-html-to-pdf-a-proc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!