Tesseract图像识别OCR的学习1

喜欢而已 提交于 2019-11-28 16:15:11

领导让做一个识别发票的服务,之前都是写增删改查,完全没接触过图像识别这种高大上的东西,记录一下吧

新建一个项目,导入tess4j


        <dependency>
            <groupId>net.sourceforge.tess4j</groupId>
            <artifactId>tess4j</artifactId>
            <version>4.4.0</version>
        </dependency>

写一个测试类

package com.example.cor1.test;

import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;

import java.io.File;

public class Test1 {

    public static void main(String[] args) throws TesseractException {

        File test1 = new File("C:\\Users\\xxx\\Desktop\\tesseract\\test1.png");
        Tesseract tesseract = new Tesseract();
        tesseract.setLanguage("chi_sim");
        String s = tesseract.doOCR(test1);
        System.out.println(s);
    }
}

启动就报错了

Exception in thread "main" java.lang.NoSuchMethodError: com.sun.jna.Native.load(Ljava/lang/String;Ljava/lang/Class;)Lcom/sun/jna/Library;
	at net.sourceforge.tess4j.util.LoadLibs.getTessAPIInstance(LoadLibs.java:85)
	at net.sourceforge.tess4j.TessAPI.<clinit>(TessAPI.java:42)
	at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:427)
	at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:223)
	at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:195)
	at com.example.cor1.test.Test1.main(Test1.java:15)

提示下面这里没有Native.load方法

    public static TessAPI getTessAPIInstance() {
        return (TessAPI)Native.load(getTesseractLibName(), TessAPI.class);
    }

用IDEA点Native,是4.几的jna包,但是4.4.0版本的tess4j的pom.xml中依赖的jna是5.3.1,用IDEA的查看maven依赖关系的那个功能看,也没看到有冲突,不知道为啥,所以把pom.xml改了下

<dependency>
            <groupId>net.java.dev.jna</groupId>
            <artifactId>jna</artifactId>
            <version>5.3.1</version>
        </dependency>
        <dependency>
            <groupId>net.sourceforge.tess4j</groupId>
            <artifactId>tess4j</artifactId>
            <version>4.4.0</version>
            <exclusions>
                <exclusion>
                    <artifactId>commons-io</artifactId>
                    <groupId>commons-io</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-logging</artifactId>
                    <groupId>commons-logging</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>jna</artifactId>
                    <groupId>net.java.dev.jna</groupId>
                </exclusion>
            </exclusions>
        </dependency>

commons那两个是看到有版本冲突,顺便去掉了

之后启动就不报上面那个错误了,报了新的错

Error opening data file ./chi_sim.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'chi_sim'
Tesseract couldn't load any languages!
Warning: Invalid resolution 0 dpi. Using 70 instead.
Exception in thread "main" java.lang.Error: Invalid memory access
	at com.sun.jna.Native.invokePointer(Native Method)
	at com.sun.jna.Function.invokePointer(Function.java:497)
	at com.sun.jna.Function.invoke(Function.java:441)
	at com.sun.jna.Function.invoke(Function.java:361)
	at com.sun.jna.Library$Handler.invoke(Library.java:265)
	at com.sun.proxy.$Proxy0.TessBaseAPIGetUTF8Text(Unknown Source)
	at net.sourceforge.tess4j.Tesseract.getOCRText(Tesseract.java:517)
	at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:359)
	at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:228)
	at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:195)
	at com.example.cor1.test.Test1.main(Test1.java:15)

提示找不到中文的训练库,看了下jar包里的tessdata里是没中文的,其实这个错是因为没指定训练库的路径

https://github.com/tesseract-ocr/tessdata下载训练库。放到根目录下

代码里也要指定路径

    public static void main(String[] args) throws TesseractException {

        File test1 = new File("C:\\Users\\xxx\\Desktop\\tesseract\\test1.png");
        Tesseract tesseract = new Tesseract();
        tesseract.setDatapath("tessdata");
        tesseract.setLanguage("chi_sim");
        String s = tesseract.doOCR(test1);
        System.out.println(s);
    }

然后再运行就能识别出来啦

原图:

识别出来的:

显而易见,识别的不太对,还需要努力!

 

 

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!