Get the number of pages in a PDF document

后端 未结 12 1795
你的背包
你的背包 2020-12-07 09:00

This question is for referencing and comparing. The solution is the accepted answer below.

Many hours have I searched for a fast and easy, but mostly a

相关标签:
12条回答
  • 2020-12-07 09:19

    You can use qpdf like below. If a file file_name.pdf has 100 pages,

    $ qpdf --show-npages file_name.pdf
    100
    
    0 讨论(0)
  • 2020-12-07 09:20

    If you have access to shell, a simplest (but not usable on 100% of PDFs) approach would be to use grep.

    This should return just the number of pages:

    grep -m 1 -aoP '(?<=\/N )\d+(?=\/)' file.pdf
    

    Example: https://regex101.com/r/BrUTKn/1

    Switches description:

    • -m 1 is neccessary as some files can have more than one match of regex pattern (volonteer needed to replace this with match-only-first regex solution extension)
    • -a is neccessary to treat the binary file as text
    • -o to show only the match
    • -P to use Perl regular expression

    Regex explanation:

    • starting "delimiter": (?<=\/N ) lookbehind of /N (nb. space character not seen here)
    • actual result: \d+ any number of digits
    • ending "delimiter": (?=\/) lookahead of /

    Nota bene: if in some case match is not found, it's safe to assume only 1 page exists.

    0 讨论(0)
  • 2020-12-07 09:21

    The R package pdftools and the function pdf_info() provides information on the number of pages in a pdf.

    library(pdftools)
    pdf_file <- file.path(R.home("doc"), "NEWS.pdf")
    info <- pdf_info(pdf_file)
    nbpages <- info[2]
    nbpages
    
    $pages
    [1] 65
    
    0 讨论(0)
  • 2020-12-07 09:22

    I created a wrapper class for pdfinfo in case it's useful to anyone, based on Richard's answer@

    /**
     * Wrapper for pdfinfo program, part of xpdf bundle
     * http://www.xpdfreader.com/about.html
     * 
     * this will put all pdfinfo output into keyed array, then make them accessible via getValue
     */
    class PDFInfoWrapper {
    
        const PDFINFO_CMD = 'pdfinfo';
    
        /**
         * keyed array to hold all the info
         */
        protected $info = array();
    
        /**
         * raw output in case we need it
         */
        public $raw = "";
    
        /**
         * Constructor
         * @param string $filePath - path to file
         */
        public function __construct($filePath) {
            exec(self::PDFINFO_CMD . ' "' . $filePath . '"', $output);
    
            //loop each line and split into key and value
            foreach($output as $line) {
                $colon = strpos($line, ':');
                if($colon) {
                    $key = trim(substr($line, 0, $colon));
                    $val = trim(substr($line, $colon + 1));
    
                    //use strtolower to make case insensitive
                    $this->info[strtolower($key)] = $val;
                }
            }
    
            //store the raw output
            $this->raw = implode("\n", $output);
    
        }
    
        /**
         * get a value
         * @param string $key - key name, case insensitive
         * @returns string value
         */
        public function getValue($key) {
            return @$this->info[strtolower($key)];
        }
    
        /**
         * list all the keys
         * @returns array of key names
         */
        public function getAllKeys() {
            return array_keys($this->info);
        }
    
    }
    
    0 讨论(0)
  • 2020-12-07 09:24

    Here is a Windows command script using gsscript that reports the PDF file page number

    @echo off
    echo.
    rem
    rem this file: getlastpagenumber.cmd
    rem version 0.1 from commander 2015-11-03
    rem need Ghostscript e.g. download and install from http://www.ghostscript.com/download/
    rem Install path "C:\prg\ghostscript" for using the script without changes \\ and have less problems with UAC
    rem
    
    :vars
      set __gs__="C:\prg\ghostscript\bin\gswin64c.exe"
      set __lastpagenumber__=1
      set __pdffile__="%~1"
      set __pdffilename__="%~n1"
      set __datetime__=%date%%time%
      set __datetime__=%__datetime__:.=%
      set __datetime__=%__datetime__::=%
      set __datetime__=%__datetime__:,=%
      set __datetime__=%__datetime__:/=% 
      set __datetime__=%__datetime__: =% 
      set __tmpfile__="%tmp%\%~n0_%__datetime__%.tmp"
    
    :check
      if %__pdffile__%=="" goto error1
      if not exist %__pdffile__% goto error2
      if not exist %__gs__% goto error3
    
    :main
      %__gs__% -dBATCH -dFirstPage=9999999 -dQUIET -dNODISPLAY -dNOPAUSE  -sstdout=%__tmpfile__%  %__pdffile__%
      FOR /F " tokens=2,3* usebackq delims=:" %%A IN (`findstr /i "number" test.txt`) DO set __lastpagenumber__=%%A 
      set __lastpagenumber__=%__lastpagenumber__: =%
      if exist %__tmpfile__% del %__tmpfile__%
    
    :output
      echo The PDF-File: %__pdffilename__% contains %__lastpagenumber__% pages
      goto end
    
    :error1
      echo no pdf file selected
      echo usage: %~n0 PDFFILE
      goto end
    
    :error2
      echo no pdf file found
      echo usage: %~n0 PDFFILE
      goto end
    
    :error3
      echo.can not find the ghostscript bin file
      echo.   %__gs__%
      echo.please download it from:
      echo.   http://www.ghostscript.com/download/
      echo.and install to "C:\prg\ghostscript"
      goto end
    
    :end
      exit /b
    
    0 讨论(0)
  • 2020-12-07 09:31

    if you can't install any additional packages, you can use this simple one-liner:

    foundPages=$(strings < $PDF_FILE | sed -n 's|.*Count -\{0,1\}\([0-9]\{1,\}\).*|\1|p' | sort -rn | head -n 1)
    
    0 讨论(0)
提交回复
热议问题