C# Extract Text by using PdfSharp return unreadable content

感情迁移 提交于 2020-01-04 09:40:05

问题


I managed to extract text from PDF version 1.2 by using PdfSharp as refer to this link

My code to extract text

private string ExtractText(CObject cObject, ref string pdfcontentstr)
    {
        if (cObject is COperator)
        {
            var cOperator = cObject as COperator;
            if (cOperator.OpCode.Name == OpCodeName.Tj.ToString() ||
                cOperator.OpCode.Name == OpCodeName.TJ.ToString())
            {
                foreach (var cOperand in cOperator.Operands)
                {
                    ExtractText(cOperand, ref pdfcontentstr);
                }
            }
        }
        else if (cObject is CSequence)
        {
            var cSequence = cObject as CSequence;
            foreach (var element in cSequence)
            {
                ExtractText(element, ref pdfcontentstr);
            }
        }
        else if (cObject is CString)
        {
            var cString = cObject as CString;
            pdfcontentstr = pdfcontentstr + ";" + cString.Value;
        }
        return pdfcontentstr;
    }

But when i try to extract PDF version 1.3 (with same content), the program return unreadable content, example:

0%0O0R0F0N00%0

The actual content in PDF file: Block B

Anyone can help? Thanks in advance.

来源:https://stackoverflow.com/questions/41497882/c-sharp-extract-text-by-using-pdfsharp-return-unreadable-content

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!