发表新帖

发表新帖

How to extract PDF fields from a filled out form in Python?

后端未结

关注

 6  1109

北恋 2020-12-02 07:10

I\'m trying to use Python to processes some PDF forms that were filled out and signed using Adobe Acrobat Reader.

I\'ve tried:

The pdfminer demo: it di

6条回答

庸人自扰 (楼主)

2020-12-02 07:28
You should be able to do it with pdfminer, but it will require some delving into the internals of pdfminer and some knowledge about the pdf format (wrt forms of course, but also about pdf's internal structures like "dictionaries" and "indirect objects").

This example might help you on your way (I think it will work only on simple cases, with no nested fields etc...)
```
import sys
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdftypes import resolve1

filename = sys.argv[1]
fp = open(filename, 'rb')

parser = PDFParser(fp)
doc = PDFDocument(parser)
fields = resolve1(doc.catalog['AcroForm'])['Fields']
for i in fields:
    field = resolve1(i)
    name, value = field.get('T'), field.get('V')
    print '{0}: {1}'.format(name, value)
```
EDIT: forgot to mention: if you need to provide a password, pass it to doc.initialize()
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题