Many hours have I searched for a fast and easy, but mostly a
You can use qpdf
like below. If a file file_name.pdf has 100 pages,
$ qpdf --show-npages file_name.pdf
100
If you have access to shell, a simplest (but not usable on 100% of PDFs) approach would be to use grep
.
This should return just the number of pages:
grep -m 1 -aoP '(?<=\/N )\d+(?=\/)' file.pdf
Example: https://regex101.com/r/BrUTKn/1
Switches description:
-m 1
is neccessary as some files can have more than one match of regex pattern (volonteer needed to replace this with match-only-first regex solution extension)-a
is neccessary to treat the binary file as text-o
to show only the match-P
to use Perl regular expressionRegex explanation:
(?<=\/N )
lookbehind of /N
(nb. space character not seen here)\d+
any number of digits(?=\/)
lookahead of /
Nota bene: if in some case match is not found, it's safe to assume only 1 page exists.
The R package pdftools and the function pdf_info()
provides information on the number of pages in a pdf.
library(pdftools)
pdf_file <- file.path(R.home("doc"), "NEWS.pdf")
info <- pdf_info(pdf_file)
nbpages <- info[2]
nbpages
$pages
[1] 65
I created a wrapper class for pdfinfo in case it's useful to anyone, based on Richard's answer@
/**
* Wrapper for pdfinfo program, part of xpdf bundle
* http://www.xpdfreader.com/about.html
*
* this will put all pdfinfo output into keyed array, then make them accessible via getValue
*/
class PDFInfoWrapper {
const PDFINFO_CMD = 'pdfinfo';
/**
* keyed array to hold all the info
*/
protected $info = array();
/**
* raw output in case we need it
*/
public $raw = "";
/**
* Constructor
* @param string $filePath - path to file
*/
public function __construct($filePath) {
exec(self::PDFINFO_CMD . ' "' . $filePath . '"', $output);
//loop each line and split into key and value
foreach($output as $line) {
$colon = strpos($line, ':');
if($colon) {
$key = trim(substr($line, 0, $colon));
$val = trim(substr($line, $colon + 1));
//use strtolower to make case insensitive
$this->info[strtolower($key)] = $val;
}
}
//store the raw output
$this->raw = implode("\n", $output);
}
/**
* get a value
* @param string $key - key name, case insensitive
* @returns string value
*/
public function getValue($key) {
return @$this->info[strtolower($key)];
}
/**
* list all the keys
* @returns array of key names
*/
public function getAllKeys() {
return array_keys($this->info);
}
}
Here is a Windows command script using gsscript that reports the PDF file page number
@echo off
echo.
rem
rem this file: getlastpagenumber.cmd
rem version 0.1 from commander 2015-11-03
rem need Ghostscript e.g. download and install from http://www.ghostscript.com/download/
rem Install path "C:\prg\ghostscript" for using the script without changes \\ and have less problems with UAC
rem
:vars
set __gs__="C:\prg\ghostscript\bin\gswin64c.exe"
set __lastpagenumber__=1
set __pdffile__="%~1"
set __pdffilename__="%~n1"
set __datetime__=%date%%time%
set __datetime__=%__datetime__:.=%
set __datetime__=%__datetime__::=%
set __datetime__=%__datetime__:,=%
set __datetime__=%__datetime__:/=%
set __datetime__=%__datetime__: =%
set __tmpfile__="%tmp%\%~n0_%__datetime__%.tmp"
:check
if %__pdffile__%=="" goto error1
if not exist %__pdffile__% goto error2
if not exist %__gs__% goto error3
:main
%__gs__% -dBATCH -dFirstPage=9999999 -dQUIET -dNODISPLAY -dNOPAUSE -sstdout=%__tmpfile__% %__pdffile__%
FOR /F " tokens=2,3* usebackq delims=:" %%A IN (`findstr /i "number" test.txt`) DO set __lastpagenumber__=%%A
set __lastpagenumber__=%__lastpagenumber__: =%
if exist %__tmpfile__% del %__tmpfile__%
:output
echo The PDF-File: %__pdffilename__% contains %__lastpagenumber__% pages
goto end
:error1
echo no pdf file selected
echo usage: %~n0 PDFFILE
goto end
:error2
echo no pdf file found
echo usage: %~n0 PDFFILE
goto end
:error3
echo.can not find the ghostscript bin file
echo. %__gs__%
echo.please download it from:
echo. http://www.ghostscript.com/download/
echo.and install to "C:\prg\ghostscript"
goto end
:end
exit /b
if you can't install any additional packages, you can use this simple one-liner:
foundPages=$(strings < $PDF_FILE | sed -n 's|.*Count -\{0,1\}\([0-9]\{1,\}\).*|\1|p' | sort -rn | head -n 1)