How can I determine the number of pages in a given PDF file, using a free/open source Java API?
If you generates the PDF with FOP, then you can use http://xmlgraphics.apache.org/fop/
You can count the pages with the help of fop tags.
If it is just a simple pdf file from an external source, then you should check iText API.
You should be able to do this with iText. See this thread for how to solve the problem. Here is chapter 2, which is incorrectly linked in the thread:
PdfReader reader = new PdfReader("SimpleRegistrationForm.pdf");
int pages = reader.getNumberOfPages();
If you want to get more information about PDF, please use below code. If document does not contain any of the information, it returns null. This is pdfbox library of apache.
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
public class DocumentService {
public void showDocumentInfo(){
PDDocument document= PDDocument.load(new File("file.pdf"));
PDDocumentInformation info = document.getDocumentInformation();
System.out.println( "Page Count=" + document.getNumberOfPages() );
System.out.println( "Title=" + info.getTitle() );
System.out.println( "Author=" + info.getAuthor() );
System.out.println( "Subject=" + info.getSubject() );
System.out.println( "Keywords=" + info.getKeywords() );
System.out.println( "Creator=" + info.getCreator() );
System.out.println( "Producer=" + info.getProducer() );
System.out.println( "Creation Date=" + info.getCreationDate() );
System.out.println( "Modification Date=" + info.getModificationDate());
System.out.println( "Trapped=" + info.getTrapped() );
}
}
You can use Apache PDFBox to load a PDF document and then call the getNumberOfPages method to return the page count.
PDDocument doc = PDDocument.load(new File("file.pdf"));
int count = doc.getNumberOfPages();