问题
While extracting text from PDF file using iTextSharp, I am getting this error: "Could not find image data or EI"
This error occurs on particular pages that contains image only.
Could the reason be because I am trying to extract the text without checking whether there is any text content in the page?
回答1:
Inline images are not specified very well in the PDF specification. The image data should be contained between ID
and EI
operators. But there's a possibility the image data itself contains "EI".
In iText(Sharp) image data is read until <whitespace>EI<whitespace>
is encountered. However, there are PDFs that have EI<whitespace>
as the end of inline image data. For those inline images iText(Sharp) throws this exception.
If this is the issue with your PDF, you can probably fix it by changing found == 1
to found <= 1
in InlineImageUtils.ParseInlineImageSamples()
here:
http://sourceforge.net/p/itextsharp/code/HEAD/tree/trunk/src/core/iTextSharp/text/pdf/parser/InlineImageUtils.cs#l337
回答2:
This is because the computer resolution is too high and reprint uses lower resolution. That's ok, but fundamental profile is still from source code. That is to say support many computer resolution.
来源:https://stackoverflow.com/questions/20399947/while-extracting-text-from-pdf-file-using-itextsharp-i-am-getting-this-error