read word document in php

前端 未结 4 1436
北荒
北荒 2020-12-06 07:31

I\'m doing a project now, and I\'m stuck with reading word documents.

Word File content.

This is a test word file in PHP.

Thank you.
相关标签:
4条回答
  • 2020-12-06 07:46

    "PHPWord is a library written in pure PHP that provides a set of classes to write to and read from different document file formats." (PHPOffice, 2016)

    This open php library should solve your problem. you can eighter download it oder get it by composer:

    https://github.com/PHPOffice/PHPWord

    0 讨论(0)
  • 2020-12-06 07:51

    "docx" is different from "doc". Docx files are basically xml files in a zipfile container (as described by wikipedia). Doc files are binary blobs.

    I am aware of no library that can easily read docx files in php (although Phpdocx can write them). However, since these are just zip files and xml files, you should be able do put something together using ZipArchive to open the docx container and DOMDocument or SimpleXML or XMLReader or XSLTProcessor to read the xml documents themselves.

    0 讨论(0)
  • 2020-12-06 08:05

    Word document isn't stored conveniently like a text file (it's more like xml / binary file), so you can't just use echo and expects it to output the human readable portion of the docx file.

    There's a library that could do what you want, but it takes only doc file

    Docvert

    0 讨论(0)
  • 2020-12-06 08:08

    For docx use this function

    function read_docx($filename){
    
        $striped_content = '';
        $content = '';
    
        if(!$filename || !file_exists($filename)) return false;
    
        $zip = zip_open($filename);
        if (!$zip || is_numeric($zip)) return false;
    
        while ($zip_entry = zip_read($zip)) {
    
            if (zip_entry_open($zip, $zip_entry) == FALSE) continue;
    
            if (zip_entry_name($zip_entry) != "word/document.xml") continue;
    
            $content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
    
            zip_entry_close($zip_entry);
        }
        zip_close($zip);      
        $content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
        $content = str_replace('</w:r></w:p>', "\r\n", $content);
        $striped_content = strip_tags($content);
    
        return $striped_content;
    }
    

    It will return text from docx

    0 讨论(0)
提交回复
热议问题