Get the number of pages in a PDF document

后端未结

关注

 12  1821

This question is for referencing and comparing. The solution is the accepted answer below.

Many hours have I searched for a fast and easy, but mostly a

相关标签:

12条回答

孤街浪徒

2020-12-07 09:19

You can use qpdf like below. If a file file_name.pdf has 100 pages,

$ qpdf --show-npages file_name.pdf 100

0 讨论(0)

发布评论:

提交评论

加载中...

再見小時候

2020-12-07 09:20

If you have access to shell, a simplest (but not usable on 100% of PDFs) approach would be to use grep.

This should return just the number of pages:

grep -m 1 -aoP '(?<=\/N )\d+(?=\/)' file.pdf

Example: https://regex101.com/r/BrUTKn/1

Switches description:

-m 1 is neccessary as some files can have more than one match of regex pattern (volonteer needed to replace this with match-only-first regex solution extension)

-a is neccessary to treat the binary file as text

-o to show only the match

-P to use Perl regular expression

Regex explanation:

starting "delimiter": (?<=\/N ) lookbehind of /N (nb. space character not seen here)

actual result: \d+ any number of digits

ending "delimiter": (?=\/) lookahead of /

Nota bene: if in some case match is not found, it's safe to assume only 1 page exists.

0 讨论(0)

发布评论:

提交评论

加载中...

一向

2020-12-07 09:21

The R package pdftools and the function pdf_info() provides information on the number of pages in a pdf.

library(pdftools) pdf_file <- file.path(R.home("doc"), "NEWS.pdf") info <- pdf_info(pdf_file) nbpages <- info[2] nbpages $pages [1] 65

0 讨论(0)

发布评论:

提交评论

加载中...

小鲜肉

2020-12-07 09:22

I created a wrapper class for pdfinfo in case it's useful to anyone, based on Richard's answer@

/** * Wrapper for pdfinfo program, part of xpdf bundle * http://www.xpdfreader.com/about.html * * this will put all pdfinfo output into keyed array, then make them accessible via getValue */ class PDFInfoWrapper { const PDFINFO_CMD = 'pdfinfo'; /** * keyed array to hold all the info */ protected $info = array(); /** * raw output in case we need it */ public $raw = ""; /** * Constructor * @param string $filePath - path to file */ public function __construct($filePath) { exec(self::PDFINFO_CMD . ' "' . $filePath . '"', $output); //loop each line and split into key and value foreach($output as $line) { $colon = strpos($line, ':'); if($colon) { $key = trim(substr($line, 0, $colon)); $val = trim(substr($line, $colon + 1)); //use strtolower to make case insensitive $this->info[strtolower($key)] = $val; } } //store the raw output $this->raw = implode("\n", $output); } /** * get a value * @param string $key - key name, case insensitive * @returns string value */ public function getValue($key) { return @$this->info[strtolower($key)]; } /** * list all the keys * @returns array of key names */ public function getAllKeys() { return array_keys($this->info); } }

0 讨论(0)

发布评论:

提交评论

加载中...

借酒劲吻你

2020-12-07 09:24

Here is a Windows command script using gsscript that reports the PDF file page number

@echo off echo. rem rem this file: getlastpagenumber.cmd rem version 0.1 from commander 2015-11-03 rem need Ghostscript e.g. download and install from http://www.ghostscript.com/download/ rem Install path "C:\prg\ghostscript" for using the script without changes \\ and have less problems with UAC rem :vars set __gs__="C:\prg\ghostscript\bin\gswin64c.exe" set __lastpagenumber__=1 set __pdffile__="%~1" set __pdffilename__="%~n1" set __datetime__=%date%%time% set __datetime__=%__datetime__:.=% set __datetime__=%__datetime__::=% set __datetime__=%__datetime__:,=% set __datetime__=%__datetime__:/=% set __datetime__=%__datetime__: =% set __tmpfile__="%tmp%\%~n0_%__datetime__%.tmp" :check if %__pdffile__%=="" goto error1 if not exist %__pdffile__% goto error2 if not exist %__gs__% goto error3 :main %__gs__% -dBATCH -dFirstPage=9999999 -dQUIET -dNODISPLAY -dNOPAUSE -sstdout=%__tmpfile__% %__pdffile__% FOR /F " tokens=2,3* usebackq delims=:" %%A IN (`findstr /i "number" test.txt`) DO set __lastpagenumber__=%%A set __lastpagenumber__=%__lastpagenumber__: =% if exist %__tmpfile__% del %__tmpfile__% :output echo The PDF-File: %__pdffilename__% contains %__lastpagenumber__% pages goto end :error1 echo no pdf file selected echo usage: %~n0 PDFFILE goto end :error2 echo no pdf file found echo usage: %~n0 PDFFILE goto end :error3 echo.can not find the ghostscript bin file echo. %__gs__% echo.please download it from: echo. http://www.ghostscript.com/download/ echo.and install to "C:\prg\ghostscript" goto end :end exit /b

0 讨论(0)

发布评论:

提交评论

加载中...

暗喜

2020-12-07 09:31

if you can't install any additional packages, you can use this simple one-liner:

foundPages=$(strings < $PDF_FILE | sed -n 's|.*Count -\{0,1\}$[0-9]\{1,\}$.*|\1|p' | sort -rn | head -n 1)

0 讨论(0)

发布评论:

提交评论

加载中...

1 2 下一页

验证码

看不清?

提交回复