I want to be able to upload an MS word document and export it a page in my site.
Is there any way to accomplish this?
You can convert Word docx documents to html using Print2flash library. Here is an PHP excerpt from my client's site which converts a document to html:
include("const.php");
$p2fServ = new COM("Print2Flash4.Server2");
$p2fServ->DefaultProfile->DocumentType=HTML5;
$p2fServ->ConvertFile($wordfile,$htmlFile);
It converts a document which path is specified in $wordfile variable to a html page file specified by $htmlFile variable. All formatting, hyperlinks and charts are retained. You can get the required const.php file altogether with a fuller sample from Print2flash SDK.
If you don't refuse REST API, then you can use:
Sample code for RawText:
$result = $rawText -> parse($your_file)
//FUNCTION :: read a docx file and return the string
function readDocx($filePath) {
// Create new ZIP archive
$zip = new ZipArchive;
$dataFile = 'word/document.xml';
// Open received archive file
if (true === $zip->open($filePath)) {
// If done, search for the data file in the archive
if (($index = $zip->locateName($dataFile)) !== false) {
// If found, read it to the string
$data = $zip->getFromIndex($index);
// Close archive file
$zip->close();
// Load XML from a string
// Skip errors and warnings
$xml = DOMDocument::loadXML($data, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
// Return data without XML formatting tags
$contents = explode('\n',strip_tags($xml->saveXML()));
$text = '';
foreach($contents as $i=>$content) {
$text .= $contents[$i];
}
return $text;
}
$zip->close();
}
// In case of failure return empty string
return "";
}
ZipArchive and DOMDocument are both inside PHP so you don't need to install/include/require additional libraries.
One may use PHPDocX.
It has support for practically all HTML CSS styles. Moreover you may use templates to add extra formatting to your HTML via the replaceTemplateVariableByHTML
.
The HTML methods of PHPDocX also allow for the direct use of Word styles. You may use something like this:
$docx->embedHTML($myHTML, array('tableStyle' => 'MediumGrid3-accent5PHPDOCX'));
If you want that all your tables use the MediumGrid3-accent5 Word style. The embedHTML method as well as its version for templates (replaceTemplateVariableByHTML
) preserve inheritance, meaning by that that you may use a predefined Word style and override with CSS any of its properties.
You may also extract selected parts of your HTML using 'JQuery type' selectors.