The best way for data mining from pdf files (due to its complicated format) is to open them with adobe illustrator.
Then convert the pdf file to svg file and use a svg parser library writing some tricky code on yourself.
One efficient svg parser lib is batik
(For Linux it is quite a bit complex for converting pdf to svg:
calcmaster.net/personal_projects/pdf2svg/)
PS
I've been trying since a lot to find a solution to your second part of your question
but I've figured out in books such "Visualizing Data, Ben Fry, O’Reilly"
that pdf especially Adobe pdf is to complex to parse, so instead use a svg parser lib.