问题
I have prepared one editable form But unable to convert pdf editable fields into text using java programming.
Used API – pdfbox-app-2.0.0-RC2, PDFBox-0.7.3, itextpdf-5.1.0, pdfclown.
Pleas help me to find out how to convert pdf editable fields into text in java.
used java program (able to convert normal pdf into text but not converting pdf editable fields into text ).
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.BufferedWriter;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import java.awt.Desktop;
import javax.swing.filechooser.FileNameExtensionFilter;
import javax.swing.JFileChooser;
public class PdfConvertor_1{
public static void main(String[] args){
selectPDFFiles();
}
//allow pdf files selection for converting
public static void selectPDFFiles(){
JFileChooser chooser = new JFileChooser();
FileNameExtensionFilter filter = new FileNameExtensionFilter("PDF","pdf");
chooser.setFileFilter(filter);
chooser.setMultiSelectionEnabled(true);
int returnVal = chooser.showOpenDialog(null);
if(returnVal == JFileChooser.APPROVE_OPTION) {
File[] Files=chooser.getSelectedFiles();
System.out.println("Please wait...");
for( int i=0;i<Files.length;i++){
convertPDFToText(Files[i].toString(),"textfrompdf"+i+".txt");
}
System.out.println("Conversion complete");
}
}
public static void convertPDFToText(String src,String desc){
try{
//create file writer
FileWriter fw=new FileWriter("D:\\POC_Pdf2.txt");
//create buffered writer
BufferedWriter bw=new BufferedWriter(fw);
//create pdf reader
PdfReader pr=new PdfReader(src);
//get the number of pages in the document
int pNum=pr.getNumberOfPages();
//extract text from each page and write it to the output text file
for(int page=1;page<=pNum;page++){
String text=PdfTextExtractor.getTextFromPage(pr, page);
bw.write(text);
bw.newLine();
}
bw.flush();
bw.close();
}catch(Exception e){e.printStackTrace();}
}
}
Please check editable fields in image which i want to convert in to text using java
回答1:
Fields are not part of the page content stream, hence "getting text from a page" won't give you the value of a field.
You need to get the form from the PDF. A form is referred to from the root dictionary of a PDF, but there's a convenience method to get an AcroFields
object. This question was already answered for people who are using iTextSharp / C#: How to read PDF form data using iTextSharp?
PdfReader reader = new PdfReader(path_to_your_completed_form);
AcroFields fields = reader.getAcroFields();
String value = fields.getField(key);
In this snippet, path_to_your_completed_form
is the full path you get from your JFileChooser
and key
is the value of one of the fields that is defined in your form.
If you don't know which fields are defined in your form, please read the answer to the question How to get specific types from AcroFields? Like PushButtonField, RadioCheckField, etc? There's some code in that example that allows you to loop over the available fields and that informs you if a field is a text field, a check box, a radio button, and so on.
来源:https://stackoverflow.com/questions/34419909/convert-pdf-editable-fields-into-text-using-java-programming