every now and then my mom has to shift through these type of photos to extract the number from the image and rename it to the number.
I have been having another look at this, and had a couple of inspirations along the way....
Tesseract can accept custom dictionaries, and if you dig a little more, it appears that from v3.0, it accepts the command-line parameter digits
to make it recognise digits only - seems a useful idea for your needs.
It may not be necessary to find the boards with the digits on - it may be easier to run Tesseract multiple times with various slices of the image and let it have a try itself as that is what it is supposed to do.
So, I decided to preprocess the image by changing everything that is within 25% of black to pure black, and everything else to pure white. That gives pre-processed images like this:
Next, I generate a series of images and pass them, one at a time to Tesseract. I decided to assume that the digits are probably between 40% to 10% of the image height, so I made a loop over strips 40, 30, 20 and 10% of the image height. I then slide the strip down the image from top to bottom in 20 steps passing each strip to Tesseract, till the strip is essentially across the bottom of the image.
Here are the 40% strips - each frame of the animation is passed to Tesseract:
Here are the 20% strips - each frame of the animation is passed to Tesseract:
Having got the strips, I resize them nicely for Tesseract's sweet spot and clean them up from noise etc. Then, I pass them into Tesseract and assess the quality of the recognition, somewhat crudely, by counting the number of digits it found. Finally, I sort the output by number of digits - presumably more digits is maybe better...
There are some rough edges and bits that you could dink around with, but it is a start!
#!/bin/bash
image=${1-c1.jpg}
# Make everything that is nearly black go fully black, everything else goes white. Median for noise
# convert -delay 500 c1.jpg c2.jpg c3.jpg -normalize -fuzz 25% -fill black -opaque black -fuzz 0 -fill white +opaque black -median 9 out.gif
convert "${image}" -normalize \
-fuzz 25% -fill black -opaque black \
-fuzz 0 -fill white +opaque black \
-median 9 tmp_$$.png
# Get height of image - h
h=$(identify -format "%h" "${image}")
# Generate strips that are 40%, 30%, 20% and 10% of image height
for pc in 40 30 20 10; do
# Calculate height of this strip in pixels - sh
((sh=(h*pc)/100))
# Calculate offset from top of picture to top of bottom strip - omax
((omax=h-sh))
# Calculate step size, there will be 20 steps
((step=omax/20))
# Cut strips sh pixels high from the picture starting at top and working down in 20 steps
for (( off=0;off<$omax;off+=$step)) do
t=$(printf "%05d" $off)
# Extract strip and resize to 80 pixels tall for tesseract
convert tmp_$$.png -crop x${sh}+0+${off} \
-resize x80 -median 3 -median 3 -median 3 \
-threshold 90% +repage slice_${pc}_${t}.png
# Run slice through tesseract, seeking only digits
tesseract slice_${pc}_${t}.png temp digits quiet
# Now try and assess quality of output :-) ... by counting number of digits
digits=$(tr -cd "[0-9]" < temp.txt)
ndigits=${#digits}
[ $ndigits -gt 0 ] && [ $ndigits -lt 6 ] && echo $ndigits:$digits
done
done | sort -n
Output for Cow 618 (first number is the number of digits found)
2:11
2:11
3:573
5:33613 <--- not bad
Output for Cow 2755 (first number is the number of digits found)
2:51
3:071
3:191
3:517
4:2155 <--- pretty close
4:2755 <--- nailed that puppy :-)
4:2755 <--- nailed that puppy :-)
4:5212
5:12755 <--- pretty close
Output for Cow 3174 (first number is the number of digits found)
3:554
3:734
5:12732
5:31741 <--- pretty close
Cool question - thank you!
I have been looking at this a little and thinking how I might tackle it. I prefer the free ImageMagick software which is available for OSX, installed on most Linux distros and available for Windows.
My first reaction was to try a Sobel edge detector to the images, then threshold that and remove noise and outliers with a median filter.
I can do all that with a single command at the commandline like this:
convert c1.jpg \
-define convolve:scale='50%!' -bias 50% -morphology Convolve Sobel \
-solarize 50% -level 50,0% -auto-level -threshold 50% -median 3 result.jpg
where c1.jpg
is your first cow, and likewise for the other cows.
I end up with this:
which is a pretty reasonable starting point for working out where the numbers are in the image. I am thinking of dividing the image into tiles next, or other techniques, and then I will look at tiles/areas containing the most white. That way I will start to get a handle on where I should be pointing tesseract
to look... however, it is bedtime - apparently. Maybe someone clever like @rayryeng will take a look overnight :-)
I come up a pretty simple solution with the help of opencv.
Resize the image in order to prune outlier by contours(easier to measure the area)
std::string const img_name = "cow_00";
Mat input = imread("../forum_quest/data/" + img_name + ".jpg");
cout<<input.size()<<endl;
if(input.empty()){
cerr<<"cannot open image\n";
return;
}
if(input.cols > 1000){
cv::resize(input, input, {1000, (int)(1000.0/input.cols * input.rows)}, 0.25, 0.25);
}
Crop the region on top of 1/3
//Assume the text always lie on top 1/3 of the image
Mat crop_region;
input(Rect(0, 0, input.cols, input.rows/3)).copyTo(crop_region);
Extract foreground
cv::Mat fore_ground_extract(cv::Mat const &input)
{
vector<Mat> bgr;
split(input, bgr);
//process on blue channel as andrew suggest, because it is
//easier to get rid of vegetation
Mat text_region = bgr[0];
medianBlur(text_region, text_region, 5);
threshold(text_region, text_region, 0, 255, cv::THRESH_OTSU);
//further remove small noise, unwanted border
Mat const erode_kernel = getStructuringElement(MORPH_ELLIPSE, {11, 11});
erode(text_region, text_region, erode_kernel);
Mat const dilate_kernel = getStructuringElement(MORPH_ELLIPSE, {7, 7});
dilate(text_region, text_region, dilate_kernel);
//change the text from black to white, easier to extract as contours
bitwise_not(text_region, text_region);
return text_region;
}
Extract contour, you can use ERFilter to extract the text if accuracy of contours are low
std::vector<std::vector<cv::Point>> get_text_contours(cv::Mat const &input)
{
//Find the contours of candidate text, remove outlier with
//some contour properties
//Try ERFilter of opencv if accuracy of this solution is low
vector<cpoints> contours;
findContours(input, contours, RETR_EXTERNAL, CHAIN_APPROX_SIMPLE);
auto outlier = [](cpoints const &cp)
{
auto const rect = cv::boundingRect(cp);
return rect.width > rect.height && (rect.area() < 900 || rect.area() >= 10000);
};
auto it = std::remove_if(std::begin(contours), std::end(contours), outlier);
contours.erase(it, std::end(contours));
std::sort(std::begin(contours), std::end(contours), [](cpoints const &lhs, cpoints const &rhs)
{
return cv::boundingRect(lhs).x < cv::boundingRect(rhs).x;
});
return contours;
}
Create character classifier and loop through the text candidate
string const vocabulary = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; // must have the same order as the classifier output classes
Ptr<text::OCRHMMDecoder::ClassifierCallback> ocr = text::loadOCRHMMClassifierCNN("OCRBeamSearch_CNN_model_data.xml.gz");
vector<int> out_classes;
vector<double> out_confidences;
for(size_t i = 0; i < text_contours.size(); ++i){
Scalar const color = Scalar( rng.uniform(0, 255), rng.uniform(0,255), rng.uniform(0,255) );
drawContours(text_mask, text_contours, static_cast<int>(i), color, 2);
auto const text_loc = boundingRect(text_contours[i]);
//crop_region can gain highest accuracy since it is trained on scene image
rectangle(crop_region, text_loc, color, 2);
ocr->eval(crop_region(text_loc), out_classes, out_confidences);
cout << "OCR output = \"" << vocabulary[out_classes[0]]
<< "\" with confidence "
<< out_confidences[0] << std::endl;
putText(crop_region, string(1, vocabulary[out_classes[0]]), Point(text_loc.x, text_loc.y - 5),
FONT_HERSHEY_SIMPLEX, 2, Scalar(255, 0, 0), 2);
imshow("text_mask", text_mask);
imshow("crop_region", crop_region(text_loc));
waitKey();
}
Results:
Complete source codes is place at github
Using the PIL (Python Imaging Library) you can easily load images and process them. Using the grayscale conversion, you can convert RGB to grayscale, which should be easier to detect levels. If you want to threshold the image (to detect the white boards), there is a point() function which lets you map the colors.
On the other hand, you could write a simple program, which lets you
That should facilitate the process a lot! Writing this should be relatively easy using TkInter, PyGTK, PyQt or some other windowing toolkit.
EDIT: I was needing a similar program to categorize images here - though not OCRing them. So I finally decided this was as good a time as any and made a first try (with OCR!). Make a backup of your images before trying it out! Quick manual:
Here's the pre-alpha program:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# test_pil.py
#
# Copyright 2015 John Coppens <john@jcoppens.com>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.
#
#
import pygtk
import gtk
import glob
import os.path as osp
from os import rename
import re
import subprocess as sp
temp_image = "/tmp/test_pil.png"
image_re = """\.(?:jpe?g|png|gif)$"""
class RecognizeDigits():
def __init__(self):
pass
def process(self, img, x0, y0, x1, y1):
""" Receive the gtk.Image, and the limits of the selected area (in
window coordinates!)
Call Tesseract on the area, and give the possibility to edit the
result.
Returns None if NO is pressed, and the OCR'd (and edited) text if OK
"""
pb = img.get_pixbuf().subpixbuf(x0, y0, x1-x0, y1-y0)
pb.save(temp_image, "png")
out = sp.check_output(("tesseract", temp_image, "stdout", "-psm 7", "digits"))
out = out.replace(" ", "").strip()
dlg = gtk.MessageDialog(type = gtk.MESSAGE_QUESTION,
flags = gtk.DIALOG_MODAL,
buttons = gtk.BUTTONS_YES_NO,
message_format = "The number read is:")
entry = gtk.Entry()
entry.set_text(out)
dlg.get_message_area().pack_start(entry)
entry.show()
response = dlg.run()
nr = entry.get_text()
dlg.destroy()
if response == gtk.RESPONSE_YES:
return nr
else:
return None
class FileSelector(gtk.VBox):
""" Provides a folder selector (at the top) and a list of files in the
selected folder. On selecting a file, the FileSelector calls the
function provided to the constructor (image_viewer)
"""
def __init__(self, image_viewer):
gtk.VBox.__init__(self)
self.image_viewer = image_viewer
fc = gtk.FileChooserButton('Select a folder')
fc.set_action(gtk.FILE_CHOOSER_ACTION_SELECT_FOLDER)
fc.connect("selection-changed", self.on_file_set)
self.pack_start(fc, expand = False, fill = True)
self.tstore = gtk.ListStore(str)
self.tview = gtk.TreeView(self.tstore)
self.tsel = self.tview.get_selection()
self.tsel.connect("changed", self.on_selection_changed)
renderer = gtk.CellRendererText()
col = gtk.TreeViewColumn(None, renderer, text = 0)
self.tview.append_column(col)
scrw = gtk.ScrolledWindow()
scrw.add(self.tview)
self.pack_start(scrw, expand = True, fill = True)
def on_file_set(self, fcb):
self.tstore.clear()
self.imgdir = fcb.get_filename()
for f in glob.glob(self.imgdir + "/*"):
if re.search(image_re, f):
self.tstore.append([osp.basename(f)])
def on_selection_changed(self, sel):
model, itr = sel.get_selected()
if itr != None:
base = model.get(itr, 0)
fname = self.imgdir + "/" + base[0]
self.image_viewer(fname)
class Status(gtk.Table):
""" Small status window which shows the coordinates for of the area
selected in the image
"""
def __init__(self):
gtk.Table.__init__(self)
self.attach(gtk.Label("X"), 1, 2, 0, 1, yoptions = gtk.FILL)
self.attach(gtk.Label("Y"), 2, 3, 0, 1, yoptions = gtk.FILL)
self.attach(gtk.Label("Top left:"), 0, 1, 1, 2, yoptions = gtk.FILL)
self.attach(gtk.Label("Bottom right:"), 0, 1, 2, 3, yoptions = gtk.FILL)
self.entries = {}
for coord in (("x0", 1, 2, 1, 2), ("y0", 2, 3, 1, 2),
("x1", 1, 2, 2, 3), ("y1", 2, 3, 2, 3)):
self.entries[coord[0]] = gtk.Entry()
self.entries[coord[0]].set_width_chars(6)
self.attach(self.entries[coord[0]],
coord[1], coord[2], coord[3], coord[4],
yoptions = gtk.FILL)
def set_top_left(self, x0, y0):
self.x0 = x0
self.y0 = y0
self.entries["x0"].set_text(str(x0))
self.entries["y0"].set_text(str(y0))
def set_bottom_right(self, x1, y1):
self.x1 = x1
self.y1 = y1
self.entries["x1"].set_text(str(x1))
self.entries["y1"].set_text(str(y1))
class ImageViewer(gtk.ScrolledWindow):
""" Provides a scrollwindow to move the image around. It also detects
button press and release events (left button), will call status
to update the coordinates, and will call task on button release
"""
def __init__(self, status, task = None):
gtk.ScrolledWindow.__init__(self)
self.task = task
self.status = status
self.drawing = False
self.prev_rect = None
self.viewport = gtk.Viewport()
self.viewport.connect("button-press-event", self.on_button_pressed)
self.viewport.connect("button-release-event", self.on_button_released)
self.viewport.set_events(gtk.gdk.BUTTON_PRESS_MASK | \
gtk.gdk.BUTTON_RELEASE_MASK)
self.img = gtk.Image()
self.viewport.add(self.img)
self.add(self.viewport)
def set_image(self, fname):
self.imagename = fname
self.img.set_from_file(fname)
def on_button_pressed(self, viewport, event):
if event.button == 1: # Left button: Select rectangle start
#self.x0, self.y0 = self.translate_coordinates(self.img, int(event.x), int(event.y))
self.x0, self.y0 = int(event.x), int(event.y)
self.status.set_top_left(self.x0, self.y0)
self.drawing = True
def on_button_released(self, viewport, event):
if event.button == 1: # Right button: Select rectangle end
#self.x1, self.y1 = self.translate_coordinates(self.img, int(event.x), int(event.y))
self.x1, self.y1 = int(event.x), int(event.y)
self.status.set_bottom_right(self.x1, self.y1)
if self.task != None:
res = self.task().process(self.img, self.x0, self.y0, self.x1, self.y1)
if res == None: return
newname = osp.split(self.imagename)[0] + '/' + res + ".jpeg"
rename(self.imagename, newname)
print "Renaming ", self.imagename, newname
class MainWindow(gtk.Window):
def __init__(self):
gtk.Window.__init__(self)
self.connect("delete-event", self.on_delete_event)
self.set_size_request(600, 300)
grid = gtk.Table()
# Image selector
files = FileSelector(self.update_image)
grid.attach(files, 0, 1, 0, 1,
yoptions = gtk.FILL | gtk.EXPAND, xoptions = gtk.FILL)
# Some status information
self.status = Status()
grid.attach(self.status, 0, 1, 1, 2,
yoptions = gtk.FILL, xoptions = gtk.FILL)
# The image viewer
self.viewer = ImageViewer(self.status, RecognizeDigits)
grid.attach(self.viewer, 1, 2, 0, 2)
self.add(grid)
self.show_all()
def update_image(self, fname):
self.viewer.set_image(fname)
def on_delete_event(self, wdg, data):
gtk.main_quit()
def run(self):
gtk.mainloop()
def main():
mw = MainWindow()
mw.run()
return 0
if __name__ == '__main__':
main()
I really liked this problem but I am unfamiliar with OpenCV and Python, so I present a partial solution in Matlab. The idea is the important part, and the code is just for reference. I think using my image processing possibly augmented with mark's windowing idea could give you favorable results.
These images have a ton of vegetation in them and vegetation is typically high in greens and reds. I process only the blue channel which removes a lot of the vegetation and still leaves the white signs easily identifiable. I then use otsus
method for thresholding something like this in OpenCV cv::threshold(im_gray, img_bw, 0, 255, CV_THRESH_BINARY | CV_THRESH_OTSU);
but then I take 1.5 times the given threshold. so the threshold is still image specific by otsu
, but also very selective. At this point you have a pretty good image.
Then we just clean the image with some erosion and dilation. Also note that that my dilation element is slightly larger than my erosion one. You then have images with fairly clean numbers. Maybe tesserect could even just process the image at this point, or you can try it with the windowing.
I know OpenCV has these same functions, but as I said I just did what I was familiar with. I hope it helps you. here are my results:
and the code
%382 is 255*1.5 so basically we are taking the auto threshold and raising it by
%50 percent. graythresh is performing otsu thresholding
bim = im(:,:,3) > graythresh(im(:,:,3))*382;
bim1 = im1(:,:,3) > graythresh(im1(:,:,3))*382;
bim2 = im2(:,:,3) > graythresh(im2(:,:,3))*382;
%se and se2 are what opencv would call getStructuringElement using a
%MORPH_ELLIPSE
se = strel('disk',3);
eim = imerode(bim,se);
eim1 = imerode(bim1,se);
eim2 = imerode(bim2,se);
%se and se2 are what opencv would call getStructuringElement using a
%MORPH_ELLIPSE
se2 = strel('disk',5);
dim = imdilate(eim,se2);
dim1 = imdilate(eim1,se2);
dim2 = imdilate(eim2,se2);
subplot(3,3,1);imshow(bim);title('blue thresholded');
subplot(3,3,2);imshow(bim1);title('');
subplot(3,3,3);imshow(bim2);title('');
subplot(3,3,4);imshow(eim);title('after errosion');
subplot(3,3,5);imshow(eim1);title('');
subplot(3,3,6);imshow(eim2);title('');
subplot(3,3,7);imshow(dim);title('after dilation');
subplot(3,3,8);imshow(dim1);title('');
subplot(3,3,9);imshow(dim2);title('');