I want to be able to detect a pattern in a PDF and somehow flag it.
For instance, in this PDF, there\'s the string *2
. I want to be able to parse the PD
This is non-trivial. The problem is that PDF files are not meant to be "updated" on anything less than a page. You basically have to parse the page, adjust the PostScript rendering, and then write it back out. I don't think PyPDF has the support for doing what you want.
If "all" you want to do is to add highlighting you can probably just use the annotation dictionary. See the PDF specification for more information.
You might be able to do this using pyPDF2 but I haven't looked into it closely.