I\'m using OpenCV to extract a subimage of a scanned document and would like to use tesseract to perform OCR over this subimage.
I found out that I can use two meth
First, make a deep copy of your subImage, so that it will be stored in a coninuous memory block:
cv::Mat subImage = image(cv::Rect(50, 200, 300, 100)).clone();
Then, init a PIX headed (I don't know how) with the correct parameters.
// ???? Put your own constructor here.
PIX* pix = new PIX_HEADER(width, height, channels, depth);
OR, create it manually:
PIX pix;
pix.width = subImage.width;
...
Then set the pix data pointer to the subImage data pointer
pix.data = subImage.data;
Finally, make sure your subImage objects does not go out of scope before you finish your work with pix.
cv::Mat image = cv::imread(argv[1]);
cv::Mat gray;
cv::cvtColor(image, gray, CV_BGR2GRAY);
PIX *pixS = pixCreate(gray.size().width, gray.size().height, 8);
for(int i=0; i<gray.rows; i++)
for(int j=0; j<gray.cols; j++)
pixSetPixel(pixS, j,i, (l_uint32) gray.at<uchar>(i,j));
For Anybody using the JavaCPP presets of OpenCV/Tesseract, here is what works
Mat img = imread("file.jpg");
Mat gray = new Mat();
cvtColor(img, gray, CV_BGR2GRAY);
// api is a Tesseract client which is initialised
api.SetImage(gray.data().asBuffer(),gray.size().width(),gray.size().height(),gray.channels(),gray.size1())
tesseract::TessBaseAPI tess;
cv::Mat sub = image(cv::Rect(50, 200, 300, 100));
tess.SetImage((uchar*)sub.data, sub.size().width, sub.size().height, sub.channels(), sub.step1());
tess.Recognize(0);
const char* out = tess.GetUTF8Text();