Extract paths and shapes with iTextSharp

不羁岁月 提交于 2019-12-01 10:52:26

Here is the starting point of extracting the different commands of a page:

    var file = "test.pdf";
    var reader = new PdfReader(file);

    var streamBytes = reader.GetPageContent(2);
    var tokenizer = new PRTokeniser(new RandomAccessFileOrArray(streamBytes));
    var ps = new PdfContentParser(tokenizer);

    List<PdfObject> operands = new List<PdfObject>();
    while (ps.Parse(operands).Count > 0)
    {
        PdfLiteral oper = (PdfLiteral)operands[operands.Count - 1];
        var cmd = oper.ToString();

        switch (cmd)
        {
            case "q":
                Console.WriteLine("SaveGraphicsState(); //q");
                break;

            case "Q":
                Console.WriteLine("RestoreGraphicsState(); //Q");
                break;

           // good luck with the rest!

        }
    }

That's not supported in iTextSharp. The reason: parsing for text returns TextRenderInfo objects, parsing for images returns ImageRenderInfo objects, but in which form should we return GraphicsRenderInfo? It's hard to find something generic, and painting to a graphics context is too specific.

The idea is that you write your own parser, as I did for instance for removing OCG layers: OCGParser. This part of iText hasn't been ported to iTextSharp yet, but maybe you can use it for inspiration.

Note that you're actually building PDF to image functionality. Aren't there other products who already support this out of the box?

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!