I am using tesseract ocr to extract text from an image. Preserving the structure of the document is very important to me. Currently tesseract does not preserve the structure
Tesseract code compresses spaces in output. You will need to change the code to preserve them. See Tesseract - ambiguity in space and tab post.