heading and sub-heading extraction from PDF
问题 I am currently working in extracting text from pdf. my current issue is in distinguishing the headings and sub-headings from the extracted text. I am working with iTextSharp and using the bold text information to detect the heading. The font size cannot be trusted all the time. also tried with PDFBox. 1)I would like to know is there any method to identify headings and sub-headings from PDF. 2)Is adobe or pdfExchange editor provide any API for the same? For example: I need to extract "Tourism