I need to extract the text from a PDFs in Romanian language. The symbols: ȚțȘșĂăÎîÂâ are not extracted correctly with pdfBox or Snowtide.
Here is a sample file that
How about iText: http://itextpdf.com/
"iText® is an open source library that allows you to create and manipulate PDF documents. It enables developers looking to enhance web- and other applications with dynamic PDF document generation and/or manipulation."