While extracting text from PDF file using iTextSharp, I am getting this error: “Could not find image data or EI”

问题

While extracting text from PDF file using iTextSharp, I am getting this error: "Could not find image data or EI"

This error occurs on particular pages that contains image only.

Could the reason be because I am trying to extract the text without checking whether there is any text content in the page?

回答1:

Inline images are not specified very well in the PDF specification. The image data should be contained between ID and EI operators. But there's a possibility the image data itself contains "EI". In iText(Sharp) image data is read until <whitespace>EI<whitespace> is encountered. However, there are PDFs that have EI<whitespace> as the end of inline image data. For those inline images iText(Sharp) throws this exception.

If this is the issue with your PDF, you can probably fix it by changing found == 1 to found <= 1 in InlineImageUtils.ParseInlineImageSamples() here: http://sourceforge.net/p/itextsharp/code/HEAD/tree/trunk/src/core/iTextSharp/text/pdf/parser/InlineImageUtils.cs#l337

回答2:

This is because the computer resolution is too high and reprint uses lower resolution. That's ok, but fundamental profile is still from source code. That is to say support many computer resolution.

来源：https://stackoverflow.com/questions/20399947/while-extracting-text-from-pdf-file-using-itextsharp-i-am-getting-this-error

标签

itextsharp

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!