tesseract

java代码实现图片内容转文字

て烟熏妆下的殇ゞ 提交于 2019-12-06 10:10:57
前言 现在的手机已经可以实现拍照转文字了。作为一名程序员,得使用java代码实现这一功能,虽然可能没啥用!!! pom.xml 添加依赖 <dependency> <groupId>net.sourceforge.tess4j</groupId> <artifactId>tess4j</artifactId> <version>3.2.1</version> </dependency> 这个依赖有点东西啊,32M。 test.java 1 public static void main(String[] args) { 2 System.out.println("---------------------start--------------------------"); 3 Tesseract tesseract = new Tesseract(); 4 tesseract.setDatapath("D://DataScience//tessdata"); 5 // tesseract.setLanguage("chi_sim"); 6 try { 7 System.out.println(tesseract.doOCR(new File("C:\\Users\\caofei\\Desktop\\2.png"))); 8 } catch (TesseractException e

Error setting up the tesseract OCR in gem in rails

Deadly 提交于 2019-12-06 09:21:10
问题 I'm trying to setup the tesseract-ocr gem in my rails environment. I have ran brew install tesseract and then ran a bundle install on the app and that all runs without errors however when starting the app ( rails s ) the following error is throw: /Users/xxxx/.rvm/gems/ruby-1.9.2-p290@xxxx/gems/ffi-inline-0.0.4.3/lib/ffi/inline/compilers/gcc.rb:35:in `compile': compile error: see logs at /var/folders/66/pm_j0lp94gvcj0qnlcnsx9pw0000gn/T/.ffi-inline-501/4239dac38f2a721e0dc5b3750d71ce2e6fa4acb6

Tesseract:简单的Java光学字符识别

谁说我不能喝 提交于 2019-12-06 09:16:05
1.1 介绍 开发具有一定价值的符号是人类特有的特征。对于人们来说识别这些符号和理解图片上的文字是非常正常的事情。与计算机那样去抓取文字不同,我们完全是基于视觉的本能去阅读它们。 另一方面,计算机的工作需要具体的和有组织的内容。它们需要数字化的表示,而不是图形化的。 有时候,这是不可能的。有时,我们希望自动化的完成用双手从图像重写文本的任务。 针对这些任务, 光学字符识别 (OCR)被设计成一种允许计算机以文本形式“阅读”图形化内容的方法,和人类工作的方式相似。虽然这些系统相对准确,但仍然可能有相当大的偏差。即便如此,修复系统的错误结果也远比手工从头开始要更加容易和快速。 就像所有的系统一样,本质上是相似的,光学字符识别软件在准备好的数据集上进行训练,这些数据集提供了足够多的数据用来帮助学习字符间的差异。如果我们想让结果更加准确,那么这些软件如何学习也是非常重要的话题,不过这将是另外一篇文章的内容了。 与其重新造轮或者想出一个非常复杂(但有用)的解决方案,不如我们先坐下来看看已有的解决方案。 1.2 Tesseract 科技巨头 Google 一直在开发一个 OCR 引擎 Tesseract ,它从最初诞生到现在已有数十年的历史。它为许多语言提供了API,不过我们将专注于 Tesseract 的 Java API 。 很容易使用 Tesseract 来实现一个简单的功能

使用Tesseract OCR Engine识别图片文字

你。 提交于 2019-12-06 09:15:54
目前有很多OCR工具或者类库都提供了准确率挺高的PDF和图片识别功能。在爬虫应用中,时常需要识别验证码或者目标站点处于数据保护而使用图片来替代直接的文本。除了直接的软件和类库外,还有一些在线工具可以直接识别,使用free online ocrGooglr可以搜索到下面这几个: http://www.onlineocr.net/ http://www.free-ocr.com/ http://www.ocrconvert.com/ https://www.newocr.com/ 众多的工具中,有个wiki页面做了比较详细的比较: 详细内容请参考Comparison_of_optical_character_recognition_software。 在众多软件中,Google出品的Tesseract口碑不错,有些人认为是所有OCR软件中准确率最高的,甚至比一些商业软件还高。Google的论文中给出了如下的准确度: Tesseract是C/C++写的库,但是很多语言都有相应的包装器(wrapper),具体请参考Tesseract的Github。 下面我们以Java的包装器tess4j为例说明: 首先添加maven依赖: <dependency> <groupId>net.sourceforge.tess4j</groupId> <artifactId>tess4j<

Tesseract MacOS Error opening data file ./tessdata/eng.traineddata

时光怂恿深爱的人放手 提交于 2019-12-06 07:58:29
Installed Tesseract to do some OCR testing with Selenium WebDriver (Java). This is my maven dependency for Tess4J <dependency> <groupId>net.sourceforge.tess4j</groupId> <artifactId`enter code here`>tess4j</artifactId> <version>2.0.0</version> <scope>test</scope> </dependency> Installed Tesseract 3.03.00 via brew. Setup TESSDATA_PREFIX to the path /usr/local/Cellar/tesseract/3.04.00/share/tessdata But, actually, when I did the following command sudo find / -name tessdata I found that tessdata folder in 4 different locations. /Users/<username>/Downloads/Tess4J/tessdata /Users/<username>

How to use Tesseract-android-Tools

家住魔仙堡 提交于 2019-12-06 07:41:29
I am having the tesseract-android-tools 1.00, please help me to use the interface TessBaseAPI. I juss want to pass one .jpg image to an android application which is having some text as a part of image. then through this tesseract engine i want to extract those text into editable format.. please help to create this application in android... Volker Did you ever search in the internet for a manual? There are a lot of hints. Recently someone wrote a small tutorial . Even it is for Ubuntu, but I think it gives you a clue how to proceed. If not, your operating system is needed. I tried compiling the

How to fill the gaps in letters after Canny edge detection

折月煮酒 提交于 2019-12-06 06:34:50
问题 I'm trying to do an Arabic OCR using Tesseract, but the OCR doesn't work unless the letters are filled with black color. How do I fill the gaps after Canny edge detection? Here is a sample image and sample code: import tesserocr from PIL import Image import pytesseract import matplotlib as plt import cv2 import imutils import numpy as np image = cv2.imread(r'c:\ahmed\test3.png') gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) gray = cv2.bilateralFilter(gray,30,40,40) #gray = cv2.GaussianBlur

Cannot get the original colored bitmap after tesseract processing - android

不问归期 提交于 2019-12-06 05:33:03
I use tesseract library for android to capture certain text from an image. I know that the captured image is not saved anywhere, it gets recycled. I need to find the original colored bitmap. I have been trying to locate the original colored bitmap, but all I could find was a grayscaled bitmap: Bitmap bitmap = activity.getCameraManager().buildLuminanceSource(data, width, height).renderCroppedGreyscaleBitmap(); When I save this bitmap to the sdcard, I get a gray scaled image. renderCroppedGreyscaleBitmap() method is as follows: public Bitmap renderCroppedGreyscaleBitmap() { int width = getWidth(

c# OCR can't recognize digits (tesseract 2)

倾然丶 夕夏残阳落幕 提交于 2019-12-06 05:31:13
问题 I'm trying to extract digits from the following: It fails, I get a ~ in return. I'm using google's tesseract 2, using C# (open source c# wrapper) and now I'm wondering, is this image too crappy to be used for OCR? Because imho the digits are straight clear. Do you have any other OCR engine in mind that would nail this down? EDIT I've also tried with Asprise OCR (http://asprise.com/product/ocr/selector.php) but it fails to parse the image too... 回答1: I suggest resizing. I zoomed this page to

Pytesser set character whitelist

半腔热情 提交于 2019-12-06 05:10:26
问题 Does anyone know how to set the character whitelist for Pytesseract? I want it to only output A-z and 0-9. Is this possible? I have the following: img = Image.open('test.jpg') result = pytesseract.image_to_string(img, config='-psm 6') I'm getting other characters like / for a 1 so I would like to limit the options of possible characters. 回答1: You can accomplish that with the below line. Or you can setup the config file for tesseract to do the same thing Limit characters tesseract is looking