Tesseract 3 is able to perform page layout analysis. However, I couldn\'t find any sample code or documentation on how to use the library for such purposes. I hope someone h
Tesseract can be given a page mode parameter (-psm) which can have the following values:
0 = Orientation and script detection (OSD) only.1 = Automatic page segmentation with OSD.2 = Automatic page segmentation, but no OSD, or OCR3 = Fully automatic page segmentation, but no OSD. (Default)4 = Assume a single column of text of variable sizes.5 = Assume a single uniform block of vertically aligned text.6 = Assume a single uniform block of text.7 = Treat the image as a single text line.8 = Treat the image as a single word.9 = Treat the image as a single word in a circle.10 = Treat the image as a single character.Example:
tesseract image.tif image.txt -l eng -psm 0
However, I am not sure that it is possible to use the layout analysis in standalone mode.