How can I convert a docx document to html using php?

前端 未结 4 1304
萌比男神i
萌比男神i 2020-11-29 05:14

I want to be able to upload an MS word document and export it a page in my site.

Is there any way to accomplish this?

相关标签:
4条回答
  • 2020-11-29 06:03

    You can convert Word docx documents to html using Print2flash library. Here is an PHP excerpt from my client's site which converts a document to html:

    include("const.php");
    $p2fServ = new COM("Print2Flash4.Server2");
    $p2fServ->DefaultProfile->DocumentType=HTML5;
    $p2fServ->ConvertFile($wordfile,$htmlFile);
    

    It converts a document which path is specified in $wordfile variable to a html page file specified by $htmlFile variable. All formatting, hyperlinks and charts are retained. You can get the required const.php file altogether with a fuller sample from Print2flash SDK.

    0 讨论(0)
  • 2020-11-29 06:03

    If you don't refuse REST API, then you can use:

    • Apache Tika. Is a proven OSS leader for text-extraction
    • If you don't want to hassle with configuring and want ready-to-go solution you can use RawText, but it's not free.

    Sample code for RawText:

    $result = $rawText -> parse($your_file)
    
    0 讨论(0)
  • 2020-11-29 06:15
    //FUNCTION :: read a docx file and return the string
    function readDocx($filePath) {
        // Create new ZIP archive
        $zip = new ZipArchive;
        $dataFile = 'word/document.xml';
        // Open received archive file
        if (true === $zip->open($filePath)) {
            // If done, search for the data file in the archive
            if (($index = $zip->locateName($dataFile)) !== false) {
                // If found, read it to the string
                $data = $zip->getFromIndex($index);
                // Close archive file
                $zip->close();
                // Load XML from a string
                // Skip errors and warnings
                $xml = DOMDocument::loadXML($data, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
                // Return data without XML formatting tags
    
                $contents = explode('\n',strip_tags($xml->saveXML()));
                $text = '';
                foreach($contents as $i=>$content) {
                    $text .= $contents[$i];
                }
                return $text;
            }
            $zip->close();
        }
        // In case of failure return empty string
        return "";
    }
    

    ZipArchive and DOMDocument are both inside PHP so you don't need to install/include/require additional libraries.

    0 讨论(0)
  • 2020-11-29 06:15

    One may use PHPDocX.

    It has support for practically all HTML CSS styles. Moreover you may use templates to add extra formatting to your HTML via the replaceTemplateVariableByHTML.

    The HTML methods of PHPDocX also allow for the direct use of Word styles. You may use something like this:

    $docx->embedHTML($myHTML, array('tableStyle' => 'MediumGrid3-accent5PHPDOCX'));

    If you want that all your tables use the MediumGrid3-accent5 Word style. The embedHTML method as well as its version for templates (replaceTemplateVariableByHTML) preserve inheritance, meaning by that that you may use a predefined Word style and override with CSS any of its properties.

    You may also extract selected parts of your HTML using 'JQuery type' selectors.

    0 讨论(0)
提交回复
热议问题