ocr

ocr images from list of urls and store the results in spreadsheet

拥有回忆 提交于 2020-05-18 05:15:21
问题 Hello I have a list of image URLs that contain numbers and I want to OCR them and store the results in google spreadsheet I've found these google scripts to ocr images 1- https://gist.github.com/tagplus5/07dde5ca61fe8f42045d 2- https://ctrlq.org/code/20128-extract-text-from-image-ocr But I didn't know how to create a request variable so I've replaced request variable with URL variable like this: function doGet(url) { if (url != undefined && url != "") { var imageBlob = UrlFetchApp.fetch(url)

ocr images from list of urls and store the results in spreadsheet

帅比萌擦擦* 提交于 2020-05-18 05:15:13
问题 Hello I have a list of image URLs that contain numbers and I want to OCR them and store the results in google spreadsheet I've found these google scripts to ocr images 1- https://gist.github.com/tagplus5/07dde5ca61fe8f42045d 2- https://ctrlq.org/code/20128-extract-text-from-image-ocr But I didn't know how to create a request variable so I've replaced request variable with URL variable like this: function doGet(url) { if (url != undefined && url != "") { var imageBlob = UrlFetchApp.fetch(url)

Recognize specific numbers from table image with Pytesseract OCR

試著忘記壹切 提交于 2020-05-15 05:13:12
问题 I want to read a column of number from an attached image (png file). My code is import cv2 import pytesseract import os img = cv2.imread(os.path.join(image_path, image_name), 0) config= "-c tessedit_char_whitelist=01234567890.:ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" pytesseract.image_to_string(img, config=config) This code gives me the output string: 'n113\nun\n1.08'. As we can see, there are two problems: It fails to recognize a decimal point in 1.13 (see attached picture). It

Recognize specific numbers from table image with Pytesseract OCR

烂漫一生 提交于 2020-05-15 05:13:11
问题 I want to read a column of number from an attached image (png file). My code is import cv2 import pytesseract import os img = cv2.imread(os.path.join(image_path, image_name), 0) config= "-c tessedit_char_whitelist=01234567890.:ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" pytesseract.image_to_string(img, config=config) This code gives me the output string: 'n113\nun\n1.08'. As we can see, there are two problems: It fails to recognize a decimal point in 1.13 (see attached picture). It

converting pdf to image but after zooming in

老子叫甜甜 提交于 2020-05-14 20:48:07
问题 This link shows how pdf s could be converted to images. Is there a way to zoom my pdf s before converting to images? In my project, i am converting pdf s to png s and then using Python-tesseract library to extract text. I noticed that if I zoom pdf s and then save parts as png s then OCR provides much better results. So is there a way to zoom pdfs before converting to pngs? 回答1: I think that raising the quality (resolution) of your image is a better solution than zooming into the pdf. using

WinError 5:Access denied PyTesseract

ぃ、小莉子 提交于 2020-05-14 17:46:25
问题 I know this question has already been answered on this site, however, none of the solutions I looke up the internet seemed to work. Here's what I tried: Giving all permissions to my python file Changing PATH variable to point to my tesseract folder Running IDLE as administrator and then executing the file from there This error is quite bothering me now and I can't advance any further because of it. Here's my code if that's going to help: import pytesseract import sys import argparse try:

Character confidence for Tesseract 3.02 using config file

荒凉一梦 提交于 2020-05-14 12:45:30
问题 How would I get the % confidence per character detected? By searching around I found that you should set save_blob_choices to T. So I added that to as a line in the hocr config file in tessdata/configs and called tesseract with it. This is all I'm getting in the generated html file: <span class='ocr_line' id='line_1' title="bbox 0 0 50 17"><span class='ocrx_word' id='word_1' title="bbox 3 2 45 15"><strong>31,835</strong></span> As you can see there isn't any confidence annotations not even

How to get the letter coordinate retrieved by Tesseract ocr

拜拜、爱过 提交于 2020-05-13 17:58:38
问题 I'm trying to handle tesseract in python to just do simple job: - open a picture - run ocr - get the string - get the characters coordinates The last one is my pain! Here is my first code: import tesseract import glob import cv2 api = tesseract.TessBaseAPI() api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZéèô%") api.SetPageSegMode(tesseract.PSM_AUTO) imagepath = "C:\\Project\\Bob\\" imagePathList = glob.glob(imagepath + "*.jpg") for

How to get the letter coordinate retrieved by Tesseract ocr

和自甴很熟 提交于 2020-05-13 17:58:06
问题 I'm trying to handle tesseract in python to just do simple job: - open a picture - run ocr - get the string - get the characters coordinates The last one is my pain! Here is my first code: import tesseract import glob import cv2 api = tesseract.TessBaseAPI() api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZéèô%") api.SetPageSegMode(tesseract.PSM_AUTO) imagepath = "C:\\Project\\Bob\\" imagePathList = glob.glob(imagepath + "*.jpg") for

How to get the letter coordinate retrieved by Tesseract ocr

允我心安 提交于 2020-05-13 17:51:44
问题 I'm trying to handle tesseract in python to just do simple job: - open a picture - run ocr - get the string - get the characters coordinates The last one is my pain! Here is my first code: import tesseract import glob import cv2 api = tesseract.TessBaseAPI() api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZéèô%") api.SetPageSegMode(tesseract.PSM_AUTO) imagepath = "C:\\Project\\Bob\\" imagePathList = glob.glob(imagepath + "*.jpg") for