Unable find location of ColorSpace objects in PDF document

为君一笑 提交于 2019-12-08 12:39:00

问题


I want to identify the ColorSpace objects in PDF and fetch their location(coordinates, width and height of the colorspace) in the page. I tried traversing through the BaseDataObject in Contents.ContentContext.Resources.ColorSpaces, I can identify the Pantone Colorspaces in file (as shown in screenshot), but unable to find info regarding the location(x,y,w and h) of the object.

Where can I find the exact location of the visible objects(visible on opening a document) like ColorSpaces and embedded images?

I am using 'pdfclown' library to extract the info about ColorSpaces from PDF. Any inputs will be helpful. Thanks in advance.

ContentScanner cs =  new ContentScanner(page);     
System.Collections.Generic.List<org.pdfclown.documents.contents.colorSpaces.ColorSpace> list = cs.Contents.ContentContext.Resources.ColorSpaces.Values.ToList();
    for (int i = 0; i < list.Count; i++)
    {
            org.pdfclown.objects.PdfArray array = (org.pdfclown.objects.PdfArray)list[i].BaseDataObject;
            foreach (org.pdfclown.objects.PdfObject s in array)
            { 
                //print colorspace and its x,y,w,h
            }
    }

PDF Document (has CMYK and Pantone Colors)

Screenshot


回答1:


I want to identify the ColorSpace objects in PDF and fetch their location(coordinates, width and height of the colorspace) in the page.

I assume you mean the squares here:

Beware, these are not PDF ColorSpace objects, these are a number of simple (rectangular) paths filled with distinct colors and with some text drawn upon them.

PDF ColorSpaces are not specific renderings of colored areas, they are abstract color specifications:

Colours may be described in any of a variety of colour systems, or colour spaces. Some colour spaces are related to device colour representation (grayscale, RGB, CMYK), others to human visual perception (CIE-based). Certain special features are also modelled as colour spaces: patterns, colour mapping, separations, and high-fidelity and multitone colour.

(ISO 32000-1, section 8.6 "Colour Spaces")

As you look for something with coordinates, width and height, therefore, you are looking for drawing instructions using those abstract color spaces, not for the plain color spaces.

I tried traversing through the BaseDataObject in Contents.ContentContext.Resources.ColorSpaces, I can identify the Pantone Colorspaces in file (as shown in screenshot), but unable to find info regarding the location(x,y,w and h) of the object.

By looking at cs.Contents.ContentContext.Resources.ColorSpaces you get an enumeration of all special color spaces available for use in the current context but not the actual usages. To get the actual usages, you have to traverse the ContentScanner cs, i.e. you have to inspect the instructions in the current context, e.g. like this:

SeparationColorSpace space = null;
double X = 0, Y = 0, Width = 0, Height = 0;

void ScanForSpecialColorspaceUsage(ContentScanner cs)
{
    cs.MoveFirst();
    while (cs.MoveNext())
    {
        ContentObject content = cs.Current;
        if (content is CompositeObject)
        {
            ScanForSpecialColorspaceUsage(cs.ChildLevel);
        }
        else if (content is SetFillColorSpace _cs)
        {
            ColorSpace _space = cs.Contents.ContentContext.Resources.ColorSpaces[_cs.Name];
            space = _space as SeparationColorSpace;
        }
        else if (content is SetDeviceCMYKFillColor || content is SetDeviceGrayFillColor || content is SetDeviceRGBFillColor)
        {
            space = null;
        }
        else if (content is DrawRectangle _dr)
        {
            if (space != null)
            {
                X = _dr.X;
                Y = _dr.Y;
                Width = _dr.Width;
                Height = _dr.Height;
            }
        }
        else if (content is PaintPath _pp)
        {
            if (space != null && _pp.Filled && (X != 0 || Y != 0 || Width != 0 || Height != 0))
            {
                String name = ((PdfName)((PdfArray)space.BaseDataObject)[1]).ToString();
                Console.WriteLine("Filling rectangle at {0}, {1} with size {2}x{3} using {4}", X, Y, Width, Height, name);
            }
            X = 0;
            Y = 0;
            Width = 0;
            Height = 0;
        }
    }
}

BEWARE: This merely is a proof-of-concept, simplified as much as possible to still work in your PDF for the squares in the screen shot above.

For a general solution you will have to extend this considerably:

  • The code only inspects the given content scanner, i.e. only the content stream it has been initialized for, in your case a page content stream.

    From such a context stream other content streams may be referenced, e.g. a form XObject. To catch all the usages of interesting color spaces in a generic document, you have to recursively inspect such dependent content streams, too.

  • The code ignores the current transformation matrix.

    The current transformation matrix can be changed by an instruction to have all the drawings done by following instructions have their coordinates changed according to an affine transformation. To get all coordinates and dimensions right in a generic document, you have to apply the current transformation matrix to them.

  • The code ignores save-graphics-state/restore-graphics-state instructions.

    The current graphics state (including fill color and current transformation matrix) can be stored on a stack and restored from it. To get colors, coordinates and dimensions right in a generic document, you have to keep track of saved and restored graphics states (or use data from the cs.State for color and transformation where PDF Clown does this for you).

  • The code only looks at Separation color spaces.

    If you're interested in other color spaces, too, you have generalize this.

  • The code only understands very specific, trivial paths: only paths generated by a single instruction defining a rectangle.

    For a generic solution you have to support arbitrary paths.



来源:https://stackoverflow.com/questions/56061097/unable-find-location-of-colorspace-objects-in-pdf-document

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!