Extract PDF form field names from a PDF form

前端 未结 6 1078
旧巷少年郎
旧巷少年郎 2020-12-29 05:24

I\'m using pdftk to fill in a PDF form with an XFDF file. However, for this project I do not know in advance what fields will be present, so I need to analyse the PDF itself

6条回答
  •  粉色の甜心
    2020-12-29 06:05

    A very late answer from me, though my solution is not PHP, but I hope it might come in handy should anyone is looking for a solution for Ruby.

    First is to use pdftk to extract all fields name out then we need to cleanup the dump text, to have a good readable hash:

    def extract_fields(filename)
      field_output = `pdftk #{filename} dump_data_fields 2>&1`
      @fields = field_output.split(/^---\n/).map do |field_text|
        if field_text =~ /^FieldName: (\w+)$/
          $1
        end
      end.compact.uniq
    end
    

    Second, now we can use any XML parse to construct our XFDF:

    # code borrowed from `nguyen` gem [https://github.com/joneslee85/nguyen]
    # generate XFDF content
    def to_xfdf(fields = {}, options = {})
      builder = Nokogiri::XML::Builder.new(:encoding => 'UTF-8') do |xml|
        xml.xfdf('xmlns' => 'http://ns.adobe.com/xfdf/', 'xml:space' => 'preserve') {
          xml.f(:href => options[:file]) if options[:file]
          xml.ids(:original => options[:id], :modified => options[:id]) if options[:id]
          xml.fields {
            fields.each do |field, value|
              xml.field(:name => field) {
                if value.is_a? Array
                  value.each { |item| xml.value(item.to_s) }
                else
                  xml.value(value.to_s)
                end
              }
            end
          }
        }
      end
      builder.to_xml
    end
    
    # write fdf content to path
    def save_to(path)
      (File.open(path, 'w') << to_xfdf).close
    end
    

    Viola, that's the main logic. I highly recommend you give nguyen (https://github.com/joneslee85/nguyen) gem a try if you are looking for a lightweight lib in Ruby.

提交回复
热议问题