Adding metadata to PDF

这一生的挚爱 提交于 2020-01-03 08:35:33

问题


I need to add metadata to a PDF which I am creating using prawn. That meta-data will be extracted later by, probably, pdf-reader. This metadata will contain internal document numbers and other information needed by downstream tools.

It would be convenient to associate meta-data with each page of the PDF. The PDF specification claims that I can store per-page private data in a "Page-Piece Dictionary". Section 14.5 states:

A page-piece dictionary (PDF 1.3) may be used to hold private conforming product data. The data may be associated with a page or form XObject by means of the optional PieceInfo entry in the page object (see Table 30) or form dictionary (see Table 95). Beginning with PDF 1.4, private data may also be associated with the PDF document by means of the PieceInfo entry in the document catalogue (see Table 28).

How can I set a "page-piece dictionary" with prawn? I'm using prawn 0.12.0.

If that's not possible, how else can I achieve my goal of storing metadata about each page, either at the page level, or at the document level?


回答1:


you can look at the source of prawn

https://github.com/prawnpdf/prawn/commit/131082af5abb71d83de0e2005ecceaa829224904

info = { :Title => "Sample METADATA",
             :Author => "Me",
             :Subject => "Not Working",
             :CreationDate => Time.now }

@pdf = Prawn::Document.new(:template => filename, :info => info) 



回答2:


One way is to do none of the above; that is, don't attach the metadata as a page-piece dictionary, and don't attach it with prawn. Instead, attach the metadata as a file attachment using the pdftk command-line tool.

To do it this way, create a file with the metadata. For example, the file metadata.yaml might contain:

---
- :document_id: '12345'
  :account_id: 10
  :page_numbers:
  - 1
  - 2
  - 3
- :document_id: '12346'
  :account_id: 24
  :page_numbers:
  - 4

After you are done creating the pdf file with prawn, then use pdftk to attach the metadata file to the pdf file:

$ pdftk foo.pdf attach_files metadata.yaml output foo-with-attachment.pdf

Since pdftk will not modify a file in place, the output file must be different than the input file.

You may be able to extract the metadata file using pdf-reader, but you can certainly do it with pdftk. This command unpacks metadata.yaml into the unpacked-attachments directory.

$ pdftk foo-with-attachment.pdf unpack_files output unpacked-attachments


来源:https://stackoverflow.com/questions/18498512/adding-metadata-to-pdf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!