Why the text extracted from PDF using PDF text extractors for java such as PDFBox , itext are scatted and unstructured?
问题 I extracted text from a pdf using both Apache PDFbox and iText. But both the extracted text are completely unstructured and messy This is but the extracted text is :: 111111 1111111111111111111111111111111111111111111111111111111111111 US008631488B2 (12) United States Patent (10) Patent No.: US 8,631,488 B2 Oz et al. (45) Date of Patent: Jan. 14,2014 6,813,682 B2 1112004 Bress et al. (54) SYSTEMS AND METHODS FOR PROVIDING 7,065,644 B2 Daniell et al. 6/2006 SECURITY SERVICES DURING POWER Todd