escaping characters for substitution into a PDF

北战南征 提交于 2019-12-21 06:36:40

问题


Can anyone tell me the set of control characters for a PDF file, and how to escape them? I have a (non-deflated (inflated?)) PDF document that I would like to edit the text in, but I'm afraid of accidentally making some control sequence using parentheses and stuff.

Thanks.


回答1:


Okay, I think I found it. On page 15 of the PDF 1.7 spec (PDF link), it appears that the only characters I need to worry about are the parentheses and the backslash.

Sequence | Meaning
---------------------------------------------
\n       | LINE FEED (0Ah) (LF)
\r       | CARRIAGE RETURN (0Dh) (CR) 
\t       | HORIZONTAL TAB (09h) (HT)
\b       | BACKSPACE (08h) (BS)
\f       | FORM FEED (FF)
\(       | LEFT PARENTHESIS (28h)
\)       | RIGHT PARENTHESIS (29h)
\\       | REVERSE SOLIDUS (5Ch) (Backslash)
\ddd     | Character code ddd (octal)

Hopefully this was helpful to someone.




回答2:


You likely already know this, but PDF files have an index at the end that contains byte offsets to everything in the document. If you edit the doc by hand, you must ensure that the new text you write has exactly the same number of characters as the original.

If you want to extract PDF page content and edit that, it's pretty straightforward. My CAM::PDF library lets you do it programmatically or via the command line:

 use CAM::PDF;
 my $pdf = CAM::PDF->new($filename);
 my $page_content = $pdf->getPageContent($pagenum);
 # ...
 $pdf->setPageContent($pagenum, $page_content)l
 $pdf->cleanoutput($out_filename);

or

 getpdfpage.pl in.pdf 1 > page1.txt
 setpdfpage.pl in.pdf page1.txt 1 out.pdf


来源:https://stackoverflow.com/questions/559467/escaping-characters-for-substitution-into-a-pdf

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!