Extract PDF form field names from a PDF form

前端未结

关注

 6  1078

旧巷少年郎 2020-12-29 05:24

I\'m using pdftk to fill in a PDF form with an XFDF file. However, for this project I do not know in advance what fields will be present, so I need to analyse the PDF itself

6条回答

粉色の甜心 (楼主)

2020-12-29 06:05

A very late answer from me, though my solution is not PHP, but I hope it might come in handy should anyone is looking for a solution for Ruby.

First is to use pdftk to extract all fields name out then we need to cleanup the dump text, to have a good readable hash:

def extract_fields(filename)
  field_output = `pdftk #{filename} dump_data_fields 2>&1`
  @fields = field_output.split(/^---\n/).map do |field_text|
    if field_text =~ /^FieldName: (\w+)$/
      $1
    end
  end.compact.uniq
end

Second, now we can use any XML parse to construct our XFDF:

# code borrowed from `nguyen` gem [https://github.com/joneslee85/nguyen]
# generate XFDF content
def to_xfdf(fields = {}, options = {})
  builder = Nokogiri::XML::Builder.new(:encoding => 'UTF-8') do |xml|
    xml.xfdf('xmlns' => 'http://ns.adobe.com/xfdf/', 'xml:space' => 'preserve') {
      xml.f(:href => options[:file]) if options[:file]
      xml.ids(:original => options[:id], :modified => options[:id]) if options[:id]
      xml.fields {
        fields.each do |field, value|
          xml.field(:name => field) {
            if value.is_a? Array
              value.each { |item| xml.value(item.to_s) }
            else
              xml.value(value.to_s)
            end
          }
        end
      }
    }
  end
  builder.to_xml
end

# write fdf content to path
def save_to(path)
  (File.open(path, 'w') << to_xfdf).close
end

Viola, that's the main logic. I highly recommend you give nguyen (https://github.com/joneslee85/nguyen) gem a try if you are looking for a lightweight lib in Ruby.

0 讨论(0)

查看其它6个回答