Using pdfbox to get form field values

六月ゝ 毕业季﹏ 提交于 2019-12-05 03:57:27

问题


I'm using pdfbox for the first time. Now I'm reading something on the website Pdf

Summarizing I have a pdf like this:

only that my file has many and many different component(textField,RadionButton,CheckBox). For this pdf I have to read these values : Mauro,Rossi,MyCompany. For now I wrote the following code:

PDDocument pdDoc = PDDocument.loadNonSeq( myFile, null );
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();

for(PDField pdField : pdAcroForm.getFields()){
    System.out.println(pdField.getValue())
}

Is this a correct way to read the value inside the form component? Any suggestion about this? Where can I learn other things on pdfbox?


回答1:


The code you have should work. If you are actually looking to do something with the values, you'll likely need to use some other methods. For example, you can get specific fields using pdAcroForm.getField(<fieldName>):

PDField firstNameField = pdAcroForm.getField("firstName");
PDField lastNameField = pdAcroForm.getField("lastName");

Note that PDField is just a base class. You can cast things to sub classes to get more interesting information from them. For example:

PDCheckbox fullTimeSalary = (PDCheckbox) pdAcroForm.getField("fullTimeSalary");
if(fullTimeSalary.isChecked()) {
    log.debug("The person earns a full-time salary");
} else {
    log.debug("The person does not earn a full-time salary");
}

As you suggest, you'll find more information at the apache pdfbox website.




回答2:


The field can be a top-level field. So you need to loop until it is no longer a top-level field, then you can get the value. Code snippet below loops through all the fields and outputs the field names and values.

{
    //from your original code
    PDDocument pdDoc = PDDocument.loadNonSeq( myFile, null );
    PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
    PDAcroForm pdAcroForm = pdCatalog.getAcroForm();


    //get all fields in form
    List<PDField> fields = acroForm.getFields();
    System.out.println(fields.size() + " top-level fields were found on the form");

    //inspect field values
    for (PDField field : fields)
    {
            processField(field, "|--", field.getPartialName());
    }

    ...
}


private void processField(PDField field, String sLevel, String sParent) throws IOException
{
        String partialName = field.getPartialName();

        if (field instanceof PDNonTerminalField)
        {
                if (!sParent.equals(field.getPartialName()))
                {
                        if (partialName != null)
                        {
                                sParent = sParent + "." + partialName;
                        }
                }
                System.out.println(sLevel + sParent);

                for (PDField child : ((PDNonTerminalField)field).getChildren())
                {
                        processField(child, "|  " + sLevel, sParent);
                }
        }
        else
        {
            //field has no child. output the value
                String fieldValue = field.getValueAsString();
                StringBuilder outputString = new StringBuilder(sLevel);
                outputString.append(sParent);
                if (partialName != null)
                {
                        outputString.append(".").append(partialName);
                }
                outputString.append(" = ").append(fieldValue);
                outputString.append(",  type=").append(field.getClass().getName());
                System.out.println(outputString);
        }
}


来源:https://stackoverflow.com/questions/23497324/using-pdfbox-to-get-form-field-values

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!