tesseract

Tess-two OCR not working

女生的网名这么多〃 提交于 2019-12-20 04:47:56
问题 im trying to get text from an image using tess-two on android. But its giving me a really bad result 01-16 12:00:25.339: I/Tesseract(native)(29038): Initialized Tesseract API with language=spa and like 30 seconds later it shows this as result string: {ga ., r¿ y“: A r M í :3 ' ‘Ev’.-:.. -: A 7 » w- ?" _ Á.» ¿"A ¿rw-V r mjÏfn 'n’n . Y ' "\'ZA".‘.¡ A‘ :‘ïvAv- « ‘ :"Éf‘Ï'" -Ï«l :‘,.v:...»- . ' RFI' .. ’ g)" 3;:- 1-;4', = * ¿,arifgggk mw; .1. , ' "53» "J 't‘ ‘ ¿Las ;.‘».L',-‘» ' ' 'N‘“ "“=: - '.

Tessnet2 Init-Method crashes with certain tessdata path

ε祈祈猫儿з 提交于 2019-12-20 04:22:31
问题 I'm using the Tessnet2 assembly (which uses Tesseract) to do OCR. Unfortunately the programm crashes without any exception after I call the init method: tessnet2.Tesseract ocr = new tessnet2.Tesseract(); ocr.Init(@"D:\Test\Tessdata\german", "deu", false); The german folder contains the following tesseract 2 word data: deu.DangAmgigs deu.freq-dawg deu.inttemp deu.normproto deu.pffmtable deu.unicharset deu.user-words deu.word-dawg If I use null for the path it works fine because I installed

Is it normal that tesseract does not recognize this word in this image?

独自空忆成欢 提交于 2019-12-20 04:21:11
问题 I need to extract words from small images like this: I am using tesseract from the command line with spanish language option, like this: tesseract category.png -l spa -psm 7 category.txt I think that this text must be easy to parse by the OCR but the word is not recognized. I am using -l spa for spanish language and -psm 7 because the image has got only line (anyway if I don't use -psm parameter the result is the same). This is the result: s…"… I am using this build with the lang package:

How to save dpi info in py-opencv?

蓝咒 提交于 2019-12-20 03:52:10
问题 import cv2 def clear(img): back = cv2.imread("back.png", cv2.IMREAD_GRAYSCALE) img = cv2.bitwise_xor(img, back) ret, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY_INV) return img def threshold(img): ret, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY_INV) img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY) ret, img = cv2.threshold(img, 248, 255, cv2.THRESH_BINARY) return img def fomatImage(img): img = threshold(img) img = clear(img) return img img = fomatImage(cv2.imread("1566135246468

java读取图片文字

ⅰ亾dé卋堺 提交于 2019-12-19 13:00:37
java自动读取图片文字信息 代码片 工具类 java中识别文字使用的软件是tesseractocr(使用的版本是3.02,3以后的版本才支持中文),这个软件需要安装在本地电脑中,安装的过程中全部都按照默认进行安装(以便于Java直接调用), 想要完整的程序私信我或者直接下载https://download.csdn.net/download/qq_35571894/12040072 ,打包下载完成后导入即可运行测试。 另外想要读取PDF文件信息的请点击链接:https://download.csdn.net/download/qq_35571894/12038360 spireOCR 可以识别PDF上文字信息 代码片 代码片 . public class ImageIOHelper { //设置语言 private Locale locale = Locale . CHINESE ; //自定义语言构造的方法 public ImageIOHelper ( Locale locale ) { this . locale = locale ; } //默认构造器Locale.CHINESE public ImageIOHelper ( ) { } /** * 创建临时图片文件防止损坏初始文件 * @param imageFile * @param imageFormat like

PyTesseract call working very slow when used along with multiprocessing

独自空忆成欢 提交于 2019-12-19 10:45:22
问题 I've a function that takes in a list of images and produces the output, in a list, after applying OCR to the image. I have an another function that controls the input to this function, by using multiprocessing. So, when I have a single list (i.e. no multiprocessing), each image of the list took ~ 1s, but when I increased the lists that had to be processed parallely to 4, each image took an astounding 13s. To understand where the problem really is, I tried to create a minimal working example

Java OpenCV + Tesseract OCR “code” regocnition

喜你入骨 提交于 2019-12-19 04:05:26
问题 I'm trying to automate a process where someone manually converts a code to a digital one. Then I started reading about OCR. So I installed tesseract OCR and tried it on some images. It doesn't even detect something close to the code. I figured after reading some questions on stackoverflow, that the images need some preprocessing like skewing the image to a horizontal one, which can been done by openCV for example. Now my questions are: What kind of preprocessing or other methods should be

How do I improve the accuracy of the OCR text from Tesseract?

守給你的承諾、 提交于 2019-12-18 17:54:19
问题 I created a basic app for recognizing text using the Tesseract API from Google and integrated it with my camera app. It works fine but the only problem is the accuracy, as sometimes the text is recognized as a random set of characters and I guess the accuracy is about 50 percent. Further, when it tries to scan more than four words in an image, the app crashes. String ocrText = baseApi.getUTF8Text(); baseApi.end(); where baseApi is the object of the Tesseract API class. Do I need to use a

implement tesseract OCR in iphone

风格不统一 提交于 2019-12-18 17:32:15
问题 i want to implement handwriting recognition in my project for example when user writes A on the screen, then screen should display A , i had searched on google and so far i have found, tesseract OCR , but i am not getting what is tesseract OCR and how to implement this in my project can someone give demo tutorial of tesseract OCR and i don't know whether tesseract OCR is free or paid ... can someone give idea about tesseract OCR !! 回答1: try this one. http://tinsuke.wordpress.com/2011/11/01

Including Tess4J to a Java project as library in Eclipse

余生颓废 提交于 2019-12-18 15:49:46
问题 I have an so far empty and clean Eclipse Java project. What do I have to do to use Tess4J as library for my web service that I want to develop? Is it even possible to use it as library for an Android project? (would be shortcutting a lot) There is an issue regarding .tif with android that I came across. Tess4J is a wrapper for native code, because tesseract-ocr is written in C/C++. That I've got so far. But how to include this wrapper into my project? I've googled a lot until I have decided