How to avoid to tag the the empty <TR<TD> cells to PDF using Itext 5

依然范特西╮ 提交于 2020-01-05 03:56:04

问题


I an using i text 5 to generate the PDF from html as input . As part of PDF accessibility,adding pdfwriter.settagged().

But here all the empty and non-empty tags are tagging .can you please help how to avoid to tagging the non empty html tags


回答1:


I suppose one way to go around it, would be to go through the StructTree on the output PDF document, and try to find the tag you are looking for, without any kids, and remove it from the parent. I do not use iText 5 anymore, as it has been deprecated (only security fixes are issued), but with iText 7, you could do something like:

private void removeEmptyTag() throws IOException {
    final PdfDocument pdfDoc = new PdfDocument(new PdfReader(ORIG),
            new PdfWriter(DEST));
    PdfDictionary catalog = pdfDoc.getCatalog().getPdfObject();
    // Gets the root dictionary
    PdfDictionary structTreeRoot = catalog.getAsDictionary(PdfName.StructTreeRoot);
    manipulate(structTreeRoot);

    pdfDoc.close();
}

public boolean manipulate(PdfDictionary element) {

    if (element == null)
        return false;

    if (PdfName.TD.equals(element.get(PdfName.S))) {
        if (!element.containsKey(PdfName.K)) {
            return true;
        }
    }

    PdfArray kids = element.getAsArray(PdfName.K);
    if (kids == null) return false;
    for (int i = 0; i < kids.size(); i++) {
        if (manipulate(kids.getAsDictionary(i))) {
            kids.remove(i);
        }
    }

    return false;
}

it's not the most elegant thing, but I've used pdfHTML to create an HTML file, where I had an empty td

<tr>
    <th>Firstname</th>
    <th>Lastname</th>
    <th>Age</th>
</tr>
<tr>
    <td>Jill</td>
    <td>Smith</td>
    <td></td>
</tr>
<tr>
    <td>Eve</td>
    <td>Jackson</td>
    <td>94</td>
</tr>

and then I've used the code to go through it and remove the empty tags (or rather, tags without children). Maybe there is a solution to do it directly with xmlWorker (I am assuming this is what you are using to create the HTML document), or a better post processing alternative to my suggestion.




回答2:


You can do it directly with pdfHTML (basically the solution for HTML to PDF conversion in iText 7).

ConverterProperties props = new ConverterProperties();
props.setTagWorkerFactory(new DefaultTagWorkerFactory() {
                @Override
                public ITagWorker getCustomTagWorker(
                        IElementNode tag, ProcessorContext context) {
                    if (tag.name().equals(TagConstants.TD)) {
                        if (!tag.childNodes().isEmpty()) {
                            return new TdTagWorker(tag, context);
                        } else {
                            return new SpanTagWorker(tag, context);
                        }
                    }


                    return null;
                }
            });


PdfDocument doc = new PdfDocument(new PdfWriter(DEST));
doc.setTagged();

HtmlConverter.convertToPdf(new FileInputStream(ORIG), doc, props);

On the code above, you can use setTagWorkerFactory to have a custom behavior for your tags as detailed in the documentation. In this specific case, I'm simply changing empty TD tags into a Span element, which achieves the desired behavior (the superfluous TD tag disappears).

(to be completely honest, this relies on the inability of the TR worker to parse the SPAN tag, so it just jumps ship. I'll update the answer if I come up with a more elegant solution)



来源:https://stackoverflow.com/questions/59298938/how-to-avoid-to-tag-the-the-empty-trtd-cells-to-pdf-using-itext-5

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!