Why pdf contain one field only is around 500Kb

狂风中的少年 提交于 2019-12-02 23:35:50

问题


Here you can download pdf with one acroform field and his size is exactly 427Kb

If I remove this unique field, file is 3Kb only, why this happens please ? I tried analyse using PDF Debugger and nothing seems weird to me.


回答1:


There's an embedded "Arial" font in the acroform default resources, see Root/AcroForm/DR/Font/Arial/FontDescriptor/FontFile2.

Either you or whoever created the pdf added it for no reason. The font is not used / referenced. For the acroform default resources you could check the /DA entry (default appearance) of each field whether it contains the font name.

When you removed the field somehow you also removed the font from the acroForm default resources. (You didn't write how you removed it)

Here's some code to do it (null checks mostly missing):

    PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
    PDResources defaultResources = acroForm.getDefaultResources();
    COSDictionary fontDict = (COSDictionary) defaultResources.getCOSObject().getDictionaryObject(COSName.FONT);
    List<String> defaultAppearances = new ArrayList<>();
    List<COSName> fontDeletionList = new ArrayList<>();
    for (PDField field : acroForm.getFieldTree())
    {
        if (field instanceof PDVariableText)
        {
            PDVariableText vtField = (PDVariableText) field;
            defaultAppearances.add(vtField.getDefaultAppearance());
        }
    }
    for (COSName fontName : defaultResources.getFontNames())
    {
        if (COSName.HELV.equals(fontName) || COSName.ZA_DB.equals(fontName))
        {
            // Adobe default, always keep
            continue;
        }
        boolean found = false;
        for (String da : defaultAppearances)
        {
            if (da != null && da.contains("/" + fontName.getName()))
            {
                found = true;
                break;
            }
        }
        System.out.println(fontName + ": " + found);
        if (!found)
        {
            fontDeletionList.add(fontName);
        }
    }
    System.out.println("deletion list: " + fontDeletionList);
    for (COSName fontName : fontDeletionList)
    {
        fontDict.removeItem(fontName);
    }

The resulting file has 5KB size now.

I haven't checked the annotations. Some of them have also a /DA string but it is unclear if the acroform default resources fonts are to be used when reconstructing a missing appearance stream.

Update: Here's some additional code to replace Arial with Helv:

for (PDField field : acroForm.getFieldTree())
{
    if (field instanceof PDVariableText)
    {
        PDVariableText vtField = (PDVariableText) field;
        String defaultAppearance = vtField.getDefaultAppearance();
        if (defaultAppearance.startsWith("/Arial"))
        {
            vtField.setDefaultAppearance("/Helv " + defaultAppearance.substring(7));
            vtField.getWidgets().get(0).setAppearance(null); // this removes the font usage
            vtField.setValue(vtField.getValueAsString());
        }
        defaultAppearances.add(vtField.getDefaultAppearance());
    }
}

Note that this may not be a good idea, because the standard 14 fonts have only limited characters. Try

vtField.setValue("Ayşe");

and you'll get an exception.

More general code to replace font can be found in this answer.



来源:https://stackoverflow.com/questions/55490141/why-pdf-contain-one-field-only-is-around-500kb

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!