ocr

how to make a dictionary that can hold more than 1 data?

耗尽温柔 提交于 2020-01-16 05:38:09
问题 i've been trying to modify the program so that it could accept more than one data for a single alphabet character for example letter "A". there were some sort of ContainsKey function that allow only one key from keyboard to hold only one data. how to make it possible to hold more than one data? I'm gonna make it very clear, this is an online OCR program using unsupervised neural network. when a user draw a character in the drawing space, they will have the option to add the character into the

PyTesseract - recognize digits in simple image

风格不统一 提交于 2020-01-16 05:19:12
问题 I'm trying to use pytesseract to recognize two numbers from an image: I have tried --psm 6 up to 10 I have tried -c tessedit_char_whitelist=0123456789' None of the above returns 49 number. Closest I got is returned 4 without 9 Do you have any tips about how to make tesseract recognize it ? 回答1: Try --psm 13 --oem 3 ( oem = 1 or 2 should do also) import pytesseract from PIL import Image import requests import io response = requests.get('https://i.stack.imgur.com/oAAXR.png') text = pytesseract

How to use ctypes.util.find_library to import .so libraries in AWS lambda (python)?

社会主义新天地 提交于 2020-01-15 10:27:08
问题 What I'm trying A python package I'm using (OCRMYPDF) on Lambda needs the leptonica library liblept.so.5 . On isolating the import code I found the issue is with find_library('lept') . Printing the result returns None. from ctypes.util import find_library def lambda_handler(event, context): liblept=find_library('lept') print("liblept:%s"%liblept) The python package I'm using needs many native compiled dependencies. I'm trying to import these using lambda layers. layer structure /opt/ /opt/bin

No module named tesseract

北战南征 提交于 2020-01-15 05:28:06
问题 Working on an OCR. I can import pytesseract and use image_to_string but I want to work on this: api = tesseract.TessBaseAPI() api.SetVariable("tessedit_char_whitelist", "0123456789") api.Init('.','eng',tesseract.OEM_DEFAULT) api.SetPageSegMode(tesseract.PSM_AUTO) This is to set tesseract to detect only numbers or alphabets. When I run my code I get this error: ImportError: No module named tesseract I have tesseract-ocr installed, and pytesseract as well. Yet I keep getting this error. 回答1: I

How to extract text from table in image?

半世苍凉 提交于 2020-01-15 04:53:07
问题 I have data which in a structured table image. The data is like below: I tried to extract the text from this image using this code: import pytesseract from PIL import Image value=Image.open("data/pic_table3.png") text = pytesseract.image_to_string(value, lang="eng") print(text) and, here is the output: EA Domains Traditional role Future role Technology e Closed platforms ¢ Open platforms e Physical e Virtualized Applicationsand |e Proprietary e Inter-organizational Integration e Siloed

How to extract text from table in image?

断了今生、忘了曾经 提交于 2020-01-15 04:53:05
问题 I have data which in a structured table image. The data is like below: I tried to extract the text from this image using this code: import pytesseract from PIL import Image value=Image.open("data/pic_table3.png") text = pytesseract.image_to_string(value, lang="eng") print(text) and, here is the output: EA Domains Traditional role Future role Technology e Closed platforms ¢ Open platforms e Physical e Virtualized Applicationsand |e Proprietary e Inter-organizational Integration e Siloed

tesseract didn't get the little labels

坚强是说给别人听的谎言 提交于 2020-01-14 07:06:30
问题 I've installed tesseract on my linux environment. It works when I execute something like # tesseract myPic.jpg /output But my pic has some little labels and tesseract didn't see them. Is an option is available to set a pitch or something like that ? Example of text labels: With this pic, tesseract doesn't recognize any value... But with this pic: I have the following output: J8 J7A-J7B P7 \ 2 40 50 0 180 190 200 P1 P2 7 110 110 \ l For example, in this case, the 90 (on top left) is not seen

Searching an image for specified text

≯℡__Kan透↙ 提交于 2020-01-14 05:59:08
问题 I think I am going to ask very stupid Question here. In my current project i want to give search feature. I have an big image tutorial with lot of information about on a topic and i want to search feature in the image. Suppose use type like "Apple" it will show the Apple occurred how many times in the image and after clicking on of that the image scroll and go to the position where "Apple" is occurred. Thanks for reading my stupid question but if it is possible let me know and put some sample

Generate font from an image of text

狂风中的少年 提交于 2020-01-14 04:19:07
问题 Is it possible to generate a specific set of font from the below given image ? My idea is to generate a specific font for the below given image of text ,by manually selecting portion of the image and mapping it to a set of letter's.Generate the font for this and then use this font to make it readable for an OCR.Is generation of font possible using any open-source implementation ? Also please suggest any good OCR's. 回答1: Abbyy FineReader 10 gets better than expected results but predictably

Python Selenium Change Texts Size (Zoom?Setting?…)

♀尐吖头ヾ 提交于 2020-01-14 03:13:29
问题 I have a webpage that I need to take the screen shot first and then use OCR to parse out the texts inside. The performance of OCR could be dramatically improved if I zoom in(Mac: command + '='). So I am wondering how could I zoom in/out using selenium in Python. There is a similar post but they only have the implementations in Java and C#, but the goal is the same as mine. Zoom in/out in selenium is just one of my thoughts. To improve the performance. I know there might be several ways to