iTextSharp XMLWorker parsing really slow

六月ゝ 毕业季﹏ 提交于 2019-11-28 00:18:10
Bruno Lowagie

The question is wrong in the sense that it suggests that the HTML parsing is slowing everything down. That's not true. The bottleneck occurs even before the first snippet of HTML is parsed.

You are using the most basic handful of lines of code to create your PDF from HTML as demonstrated in the ParseHtml example:

public void createPdf(String file) throws IOException, DocumentException {
    // step 1
    Document document = new Document();
    // step 2
    PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
    // step 3
    document.open();
    // step 4
    XMLWorkerHelper.getInstance().parseXHtml(writer, document,
            new FileInputStream(HTML));
    // step 5
    document.close();
}

This code is simple, but it performs a lot of operations internally as explained in the comments of this other question: XMLWorkerHelper performance slow.

The act of registering font directories consumes plenty of time. You can avoid this, by using your own FontProvider as is done in the ParseHtmlFonts example.

public void createPdf(String file) throws IOException, DocumentException {
    // step 1
    Document document = new Document();

    // step 2
    PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
    writer.setInitialLeading(12.5f);

    // step 3
    document.open();

    // step 4

    // CSS
    CSSResolver cssResolver = new StyleAttrCSSResolver();
    CssFile cssFile = XMLWorkerHelper.getCSS(new FileInputStream(CSS));
    cssResolver.addCss(cssFile);

    // HTML
    XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
    fontProvider.register("resources/fonts/Cardo-Regular.ttf");
    fontProvider.register("resources/fonts/Cardo-Bold.ttf");
    fontProvider.register("resources/fonts/Cardo-Italic.ttf");
    fontProvider.addFontSubstitute("lowagie", "cardo");
    CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
    HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
    htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());

    // Pipelines
    PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
    HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
    CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

    // XML Worker
    XMLWorker worker = new XMLWorker(css, true);
    XMLParser p = new XMLParser(worker);
    p.parse(new FileInputStream(HTML));

    // step 5
    document.close();
}

In this case, we instruct iText DONTLOOKFORFONTS, thus saving an enormous amount of time. Instead of having iText looking for fonts, we tell iText which fonts we're going to use in the HTML.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!