Programmatically change the color of a black box in a PDF file?

眉间皱痕 提交于 2019-12-08 06:01:17

问题


I have a PDF file generated by Microsoft Word. The user has specified a "highlight" color of black to make the text look like it's a black box (and make the text look like its been redacted). I'd like to change the black boxes to yellow so that the text is highlighted instead.

Ideally, I'd like to do this in Python.

Thanks!


回答1:


Option 1: If a commercial library is an option, you can easily implement this with Amyuni PDF Creator .Net, the C# code would look like this:

using System.IO;
using Amyuni.PDFCreator;
using System.Collections;

//open a pdf document
FileStream testfile = new FileStream("test1.pdf", FileMode.Open, FileAccess.Read, FileShare.Read);
IacDocument document = new IacDocument(null);
document.Open(testfile, "");

//get the first page
IacPage page1 = document.GetPage(1);

//get all graphic objects on the page
IacAttribute attribute = page1.AttributeByName("Objects");

// listobj is an arraylist of objects
ArrayList listobj = (ArrayList)attribute.Value;

foreach (IacObject iacObj in listobj)
{
    //if the object is a rectangle and the background color is black then set it to yellow
    if ((IacObjectType)iacObj.AttributeByName("ObjectType").Value == (IacObjectType.acObjectTypeFrame && (int)obj.Attribute("BackColor").Value == 0)
    {
        obj.Attribute("BackColor").Value = 0x00FFFF; //Yellow   
    }
}

I suppose you could translate this to IronPython instead.
Usual disclaimer applies for this suggestion

Option 2: If a commercial library is not an option and you are not developing a commercial closed-source application, you could try a bit of unreliable hacking on the page content using iText:

You can try decoding the page content (see ContentByteUtils class in iText for details), inserting a color selection operator before every fill operator, then resave the file. For more details on these operators see the TABLE 4.10 Path-painting operators of the Adobe PDF reference document.

Operand f: Fill the path, using the nonzero winding number rule to determine the region to fill (see “Nonzero Winding Number Rule” on page 232).

Operand rg: sets the nonstroking color space to DeviceRGB, and sets the nonstroking color to the specified value

Operand q: saves the current graphic state

Operand Q: Restores the saved graphic state

So if you have a sequence of operators on your page:

0.0 0.0 0.0 rg % Set nonstroking color to black
25 175 175 −150 re % Construct rectangular path
f % Fill path

It should become:

0.0 0.0 0.0 rg % Set nonstroking color to black
25 175 175 −150 re % Construct rectangular path
q % Saves the current graphic state
1.0 1.0 0.0 rg % Set nonstroking color to yellow
f % Fill path
Q % Restores the saved graphic state

Some remarks:
-This approach will turn every non-text drawing into yellow (including lines, curves, etc and excluding raster images) and it will also draw as yellow any text that is drawn on the page using the same drawing operators as other PDF drawings.
-Xforms and annotations used on the page will not be processed.
-If the documents you will process are produced by the same tool in the same way you may just test a few files and see how it goes.

Important: This is just an untested idea from the top of my head, it may work, or it may not.



来源:https://stackoverflow.com/questions/14960740/programmatically-change-the-color-of-a-black-box-in-a-pdf-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!