i will put this question in simple terms.
I have this pdf:
_____
|abcd |
| |
| |
|_____|
And this one:
__
Yes... it's just Very Hard, even for a PDF Expert. And by asking the question, you've shown that you aren't one... at least not yet. Pull this off and you'll be well on your way... But:
There's no easy way to determine a bounding box that surrounds all the content on a given page. com.itextpdf.text.pdf.parser (or its # equivalent) has several classes that might help you along the way, but the bottom line is that PDF isn't designed to be parsable like this.
I strongly recommend you try some other approach. Anything that involves the phrase "and then we get the information out of the PDF" needs an overhaul. Oh, its possible, but there is Almost Always a better way to do it.