ocr

Background image cleaning for OCR

和自甴很熟 提交于 2020-02-12 01:55:52
问题 Through tesseract-OCR I am trying to extract text from the following images with a red background. I have problems extracting the text in boxes B and D because there are vertical lines. How can I clean the background like this: input: output: some idea? The image without boxes: 回答1: Here are two methods to clean the image using Python OpenCV Method #1: Numpy thresholding Since the vertical lines, horizontal lines, and the background are in red we can take advantage of this and use Numpy

Background image cleaning for OCR

╄→гoц情女王★ 提交于 2020-02-12 01:54:18
问题 Through tesseract-OCR I am trying to extract text from the following images with a red background. I have problems extracting the text in boxes B and D because there are vertical lines. How can I clean the background like this: input: output: some idea? The image without boxes: 回答1: Here are two methods to clean the image using Python OpenCV Method #1: Numpy thresholding Since the vertical lines, horizontal lines, and the background are in red we can take advantage of this and use Numpy

Extract individual field from table image to excel with OCR

倾然丶 夕夏残阳落幕 提交于 2020-02-11 19:40:31
问题 I have scanned images which have tables as shown in this image: I am trying to extract each box separately and perform OCR but when I try to detect horizontal and vertical lines and then detect boxes it's returning the following image: And when I try to perform other transformations to detect text (erode and dilate) some remains of lines are still coming along with text like below: I cannot detect text only to perform OCR and proper bounding boxes aren't being generated like below: I cannot

Extract individual field from table image to excel with OCR

不问归期 提交于 2020-02-11 19:39:29
问题 I have scanned images which have tables as shown in this image: I am trying to extract each box separately and perform OCR but when I try to detect horizontal and vertical lines and then detect boxes it's returning the following image: And when I try to perform other transformations to detect text (erode and dilate) some remains of lines are still coming along with text like below: I cannot detect text only to perform OCR and proper bounding boxes aren't being generated like below: I cannot

JVM randomly crashes EXCEPTION_ACCESS_VIOLATION during Leadtools OCR process

a 夏天 提交于 2020-02-07 01:28:46
问题 We are developing multi-threaded java application using licensed Leadtools SDK 20 for converting large pdf documents to searchable pdf documents. During OCR process Java Virtual Machine randomly crashes - EXCEPTION_ACCESS_VIOLATION(Problematic frame: C [Ltocrx.dll+0x2af3a]). I've noticed that it may happens if user try to cancel OCR process. public class OcrProgressCallback implements OcrProgressListener { @Override public void onProgress(OcrProgressData ocrProgressData) { if (ocrProgressData

Tess4j - Pdf to Tiff to tesseract - “Warning: Invalid resolution 0 dpi. Using 70 instead.”

梦想与她 提交于 2020-02-06 07:25:51
问题 I am usig tess4j (net.sourceforge.tess4j:tess4j:4.4.0) and try OCR on pdf files. So as I understood I have to transform the pdf first to tiff or png (any of those suggested?) what I did like this: tesseract.doOCR(PdfUtilities.convertPdf2Tiff(inputPdfFile)); and get following warning: Warning: Invalid resolution 0 dpi. Using 70 instead. Question Does it has any influence on my scan results? (if not, ok - I can switch off the warning) Is there a way to set the DPI by hand or should convertPdf

Tess4j - Pdf to Tiff to tesseract - “Warning: Invalid resolution 0 dpi. Using 70 instead.”

别等时光非礼了梦想. 提交于 2020-02-06 07:24:07
问题 I am usig tess4j (net.sourceforge.tess4j:tess4j:4.4.0) and try OCR on pdf files. So as I understood I have to transform the pdf first to tiff or png (any of those suggested?) what I did like this: tesseract.doOCR(PdfUtilities.convertPdf2Tiff(inputPdfFile)); and get following warning: Warning: Invalid resolution 0 dpi. Using 70 instead. Question Does it has any influence on my scan results? (if not, ok - I can switch off the warning) Is there a way to set the DPI by hand or should convertPdf

From image to numbers [closed]

丶灬走出姿态 提交于 2020-02-03 19:07:26
问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 7 years ago . I have some images that contain numbers written perfectly. These numbers can go from one to 4 characters. Is there a way to recognize and convert these numbers to text with PHP or Javascript? Thank you, Regards.

From image to numbers [closed]

泄露秘密 提交于 2020-02-03 19:07:20
问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 7 years ago . I have some images that contain numbers written perfectly. These numbers can go from one to 4 characters. Is there a way to recognize and convert these numbers to text with PHP or Javascript? Thank you, Regards.

From image to numbers [closed]

五迷三道 提交于 2020-02-03 19:05:50
问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 7 years ago . I have some images that contain numbers written perfectly. These numbers can go from one to 4 characters. Is there a way to recognize and convert these numbers to text with PHP or Javascript? Thank you, Regards.