I\'m doing a project now, and I\'m stuck with reading word documents.
Word File content.
This is a test word file in PHP.
Thank you.
"PHPWord is a library written in pure PHP that provides a set of classes to write to and read from different document file formats." (PHPOffice, 2016)
This open php library should solve your problem. you can eighter download it oder get it by composer:
https://github.com/PHPOffice/PHPWord
"docx" is different from "doc". Docx files are basically xml files in a zipfile container (as described by wikipedia). Doc files are binary blobs.
I am aware of no library that can easily read docx files in php (although Phpdocx can write them). However, since these are just zip files and xml files, you should be able do put something together using ZipArchive to open the docx container and DOMDocument or SimpleXML or XMLReader or XSLTProcessor to read the xml documents themselves.
Word document isn't stored conveniently like a text file (it's more like xml / binary file), so you can't just use echo and expects it to output the human readable portion of the docx
file.
There's a library that could do what you want, but it takes only doc
file
Docvert
For docx use this function
function read_docx($filename){
$striped_content = '';
$content = '';
if(!$filename || !file_exists($filename)) return false;
$zip = zip_open($filename);
if (!$zip || is_numeric($zip)) return false;
while ($zip_entry = zip_read($zip)) {
if (zip_entry_open($zip, $zip_entry) == FALSE) continue;
if (zip_entry_name($zip_entry) != "word/document.xml") continue;
$content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
zip_entry_close($zip_entry);
}
zip_close($zip);
$content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
$content = str_replace('</w:r></w:p>', "\r\n", $content);
$striped_content = strip_tags($content);
return $striped_content;
}
It will return text from docx