.NET OCRing an Image

风流意气都作罢 提交于 2019-12-03 03:54:19

Looks as though the answer is in giving MODI a bigger canvas. I was also trying to take a screenshot of a control and OCR it and ran into the same problem. In the end I took the image of the control, copied the image into a larger bitmap and OCRed the larger bitmap.

Another issue I found was that you must have a proper extension for your image file. In other words, .tmp doesn't cut it.

I kept the work of creating a larger source inside my OCR method, which looks something like this (I deal directly with Image objects):

public static string ExtractText(this Image image)
{
    var tmpFile = Path.GetTempFileName();
    string text;
    try
    {
        var bmp = new Bitmap(Math.Max(image.Width, 1024), Math.Max(image.Height, 768));
        var gfxResize = Graphics.FromImage(bmp);
        gfxResize.DrawImage(image, new Rectangle(0, 0, image.Width, image.Height));
        bmp.Save(tmpFile + ".bmp", ImageFormat.Bmp);
        var doc = new MODI.Document();
        doc.Create(tmpFile + ".bmp");
        doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
        var img = (MODI.Image)doc.Images[0];
        var layout = img.Layout;
        text = layout.Text;
    }
    finally
    {
        File.Delete(tmpFile);
        File.Delete(tmpFile + ".bmp");
    }

    return text;
}

I'm not sure exactly what the minimum size is, but it appears as though 1024 x 768 does the trick.

chris

yes the posts in this thread helped me gettin it to work, here what i have to add:

was trying to download images ( small ones ) then ocr...

-when processing images, it seems that theyr size must be power of 2 ! ( was able to ocr images: 512x512 , 128x128, 256x64 .. other sizes mostly failed ( like 1103x334 ))

  • transparent background also made troubles. I got the best results when creating a new tif with powerof2 boundary, white background, paste the downloaded image into it, save.

  • scaling the image did not succeed for me, since OCR is getting wrong results , specially for "german" characters like "ü"

  • in the end i also used: doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, false, false);

  • using modi from office 2003

greetings

womd

the modi ocr is working only tif with me. try to save image in "tif".

sorry my bad english

Sulaiman
doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, false, false);

Which means that I don't want it to detect the orientation and not fix any skewing. Now the command works fine on all images including the 2400x2496 tiff.

But image should be in .tif.

Hope this helps out people facing the same problem.

I had the same problem "OCR running problem" with some images. I re-scaled the image (in my case by 50%), i.e. reduced its size and voila! it works!

PhoenixCoder

I had the same issue while using the

doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);

on a tiff file that was 2400x2496. Resizing it to 50%(reducing the size) fixed the problem and the method was not throwing the exception anymore, however, it was incorrectly recognizing the text like detecting "relerence" instead of "reference" or "712017" instead of "712517". I kept trying different image sizes but they all had the same issue, until i changed the command to

doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, false, false);

which meant that i don't want it to detect the orientation and not to fix any skewing. Now the command works fine on all images including the 2400x2496 tiff.

Hope this helps out people facing the same problem

what solved my situation was using a photo editor (Paint.NET) and use the sharpen effect at maximum.

I also used: doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, false, false);

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!