Get the number of pages in a PDF document

后端 未结 12 1796
你的背包
你的背包 2020-12-07 09:00

This question is for referencing and comparing. The solution is the accepted answer below.

Many hours have I searched for a fast and easy, but mostly a

相关标签:
12条回答
  • 2020-12-07 09:31

    Here is a R function that reports the PDF file page number by using the pdfinfo command.

    pdf.file.page.number <- function(fname) {
        a <- pipe(paste("pdfinfo", fname, "| grep Pages | cut -d: -f2"))
        page.number <- as.numeric(readLines(a))
        close(a)
        page.number
    }
    if (F) {
        pdf.file.page.number("a.pdf")
    }
    
    0 讨论(0)
  • 2020-12-07 09:35

    Simplest of all is using ImageMagick

    here is a sample code

    $image = new Imagick();
    $image->pingImage('myPdfFile.pdf');
    echo $image->getNumberImages();
    

    otherwise you can also use PDF libraries like MPDF or TCPDF for PHP

    0 讨论(0)
  • 2020-12-07 09:38

    Since you're ok with using command line utilities, you can use cpdf (Microsoft Windows/Linux/Mac OS X). To obtain the number of pages in one PDF:

    cpdf.exe -pages "my file.pdf"
    
    0 讨论(0)
  • 2020-12-07 09:38

    Here is a simple example to get the number of pages in PDF with PHP.

    <?php
    
    function count_pdf_pages($pdfname) {
      $pdftext = file_get_contents($pdfname);
      $num = preg_match_all("/\/Page\W/", $pdftext, $dummy);
    
      return $num;
    }
    
    $pdfname = 'example.pdf'; // Put your PDF path
    $pages = count_pdf_pages($pdfname);
    
    echo $pages;
    
    ?>
    
    0 讨论(0)
  • 2020-12-07 09:40

    This seems to work pretty well, without the need for special packages or parsing command output.

    <?php                                                                               
    
    $target_pdf = "multi-page-test.pdf";                                                
    $cmd = sprintf("identify %s", $target_pdf);                                         
    exec($cmd, $output);                                                                
    $pages = count($output);
    
    0 讨论(0)
  • 2020-12-07 09:42

    A simple command line executable called: pdfinfo.

    It is downloadable for Linux and Windows. You download a compressed file containing several little PDF-related programs. Extract it somewhere.

    One of those files is pdfinfo (or pdfinfo.exe for Windows). An example of data returned by running it on a PDF document:

    Title:          test1.pdf
    Author:         John Smith
    Creator:        PScript5.dll Version 5.2.2
    Producer:       Acrobat Distiller 9.2.0 (Windows)
    CreationDate:   01/09/13 19:46:57
    ModDate:        01/09/13 19:46:57
    Tagged:         yes
    Form:           none
    Pages:          13    <-- This is what we need
    Encrypted:      no
    Page size:      2384 x 3370 pts (A0)
    File size:      17569259 bytes
    Optimized:      yes
    PDF version:    1.6
    

    I haven't seen a PDF document where it returned a false pagecount (yet). It is also really fast, even with big documents of 200+ MB the response time is a just a few seconds or less.

    There is an easy way of extracting the pagecount from the output, here in PHP:

    // Make a function for convenience 
    function getPDFPages($document)
    {
        $cmd = "/path/to/pdfinfo";           // Linux
        $cmd = "C:\\path\\to\\pdfinfo.exe";  // Windows
        
        // Parse entire output
        // Surround with double quotes if file name has spaces
        exec("$cmd \"$document\"", $output);
    
        // Iterate through lines
        $pagecount = 0;
        foreach($output as $op)
        {
            // Extract the number
            if(preg_match("/Pages:\s*(\d+)/i", $op, $matches) === 1)
            {
                $pagecount = intval($matches[1]);
                break;
            }
        }
        
        return $pagecount;
    }
    
    // Use the function
    echo getPDFPages("test 1.pdf");  // Output: 13
    

    Of course this command line tool can be used in other languages that can parse output from an external program, but I use it in PHP.

    I know its not pure PHP, but external programs are way better in PDF handling (as seen in the question).

    I hope this can help people, because I have spent a whole lot of time trying to find the solution to this and I have seen a lot of questions about PDF pagecount in which I didn't find the answer I was looking for. That's why I made this question and answered it myself.

    0 讨论(0)
提交回复
热议问题