convert pdf editable fields into text using java programming

旧城冷巷雨未停 提交于 2019-12-12 12:57:34

问题


I have prepared one editable form But unable to convert pdf editable fields into text using java programming.

Used API – pdfbox-app-2.0.0-RC2, PDFBox-0.7.3, itextpdf-5.1.0, pdfclown.

Pleas help me to find out how to convert pdf editable fields into text in java.

used java program (able to convert normal pdf into text but not converting pdf editable fields into text ).

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.BufferedWriter;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import java.awt.Desktop;
import javax.swing.filechooser.FileNameExtensionFilter;
import javax.swing.JFileChooser;

public class PdfConvertor_1{
 public static void main(String[] args){
  selectPDFFiles();
 }


 //allow pdf files selection for converting
 public static void selectPDFFiles(){

  JFileChooser chooser = new JFileChooser();
      FileNameExtensionFilter filter = new FileNameExtensionFilter("PDF","pdf");
      chooser.setFileFilter(filter);
      chooser.setMultiSelectionEnabled(true);
      int returnVal = chooser.showOpenDialog(null);
      if(returnVal == JFileChooser.APPROVE_OPTION) {
               File[] Files=chooser.getSelectedFiles();
               System.out.println("Please wait...");
               for( int i=0;i<Files.length;i++){     
                convertPDFToText(Files[i].toString(),"textfrompdf"+i+".txt");
                }
   System.out.println("Conversion complete");
                }

  }

 public static void convertPDFToText(String src,String desc){
  try{
   //create file writer
   FileWriter fw=new FileWriter("D:\\POC_Pdf2.txt");
   //create buffered writer
   BufferedWriter bw=new BufferedWriter(fw);
   //create pdf reader
   PdfReader pr=new PdfReader(src);
   //get the number of pages in the document
   int pNum=pr.getNumberOfPages();
   //extract text from each page and write it to the output text file
   for(int page=1;page<=pNum;page++){
    String text=PdfTextExtractor.getTextFromPage(pr, page);
    bw.write(text);
    bw.newLine();

   }
   bw.flush();
   bw.close();



  }catch(Exception e){e.printStackTrace();}

 }

}

Please check editable fields in image which i want to convert in to text using java


回答1:


Fields are not part of the page content stream, hence "getting text from a page" won't give you the value of a field.

You need to get the form from the PDF. A form is referred to from the root dictionary of a PDF, but there's a convenience method to get an AcroFields object. This question was already answered for people who are using iTextSharp / C#: How to read PDF form data using iTextSharp?

PdfReader reader = new PdfReader(path_to_your_completed_form);
AcroFields fields = reader.getAcroFields();
String value = fields.getField(key);

In this snippet, path_to_your_completed_form is the full path you get from your JFileChooser and key is the value of one of the fields that is defined in your form.

If you don't know which fields are defined in your form, please read the answer to the question How to get specific types from AcroFields? Like PushButtonField, RadioCheckField, etc? There's some code in that example that allows you to loop over the available fields and that informs you if a field is a text field, a check box, a radio button, and so on.



来源:https://stackoverflow.com/questions/34419909/convert-pdf-editable-fields-into-text-using-java-programming

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!