Find out page numbers of PDF, Docx, Doc, Ppt, Pptx files with PHP [closed]

只愿长相守 提交于 2019-12-02 23:07:54

问题


I want this functionality in my PHP application:

When user upload a document (PDF, DOCX, DOC, PPT, PPTC extensions) then after uploading user get the total number of pages of document.

But without using exec() function.


回答1:


It is possible to do some formats right in PHP. The DOCx and PPTx are easy:

For Word files:

function PageCount_DOCX($file) {
    $pageCount = 0;

    $zip = new ZipArchive();

    if($zip->open($file) === true) {
        if(($index = $zip->locateName('docProps/app.xml')) !== false)  {
            $data = $zip->getFromIndex($index);
            $zip->close();
            $xml = new SimpleXMLElement($data);
            $pageCount = $xml->Pages;
        }
        $zip->close();
    }

    return $pageCount;
}

and for PowerPoint

function PageCount_PPTX($file) {
    $pageCount = 0;

    $zip = new ZipArchive();

    if($zip->open($file) === true) {
        if(($index = $zip->locateName('docProps/app.xml')) !== false)  {
            $data = $zip->getFromIndex($index);
            $zip->close();
            $xml = new SimpleXMLElement($data);
            print_r($xml);
            $pageCount = $xml->Slides;
        }
        $zip->close();
    }

    return $pageCount;
}

Older Office documents are a different story. You'll find some discussion about doing that here: How to get the number of pages in a Word Document on linux?

As for PDF files, I prefer to use FPDI, even though it requires a license to parse newer PDF file formats. You can use do it simply like this:

function PageCount_PDF($file) {
    $pageCount = 0;
    if (file_exists($file)) {
        require_once('fpdf/fpdf.php');
        require_once('fpdi/fpdi.php');
        $pdf = new FPDI();                              // initiate FPDI
        $pageCount = $pdf->setSourceFile($file);        // get the page count
    }
    return $pageCount;
}



回答2:


Unfortunately you cannot get the page count of Office files without paginating them first. This cannot be done easily without help of other applications such as MS Office, OpenOffice or others. Even worse 10 page word document created with MS Word can be open as a 11 page document in OpenOffice due to the difference in pagination. Practically for getting the total number of pages of a .doc file, the most reliable solution is to use MS Word. You can do this job through Office Automation but it is quite expensive job for computers as it requires the pagination process for the whole document. Also you need to install MS Word on the computer/server.

You can relatively easily get the total number of pages in a PDF document. The page count information is easily accessible in the PDF format. Most PDF parser/reader libraries will give you a simple API for your purpose.



来源:https://stackoverflow.com/questions/23561719/find-out-page-numbers-of-pdf-docx-doc-ppt-pptx-files-with-php

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!