how to read docx file math equations using php code-igniter

拟墨画扇 提交于 2019-12-06 07:20:14

问题


I am trying to read a docx file from php, as i read successfully but i didnt get some equation in the word document, as i am newbie in php i didnt know how to read that please suggest some ideas, the function i have tried to read the document is

function index()
{
    $document = 'file_path';
    $text_output = $this->read_docx($document);
    echo nl2br($text_output);

}
private function read_docx($filename) 
{
    var_dump($filename);
    $striped_content = '';
    $content = '';

    $zip = zip_open($filename);

    if (!$zip || is_numeric($zip))
        return false;

    while ($zip_entry = zip_read($zip)) {

        if (zip_entry_open($zip, $zip_entry) == FALSE)
            continue;

        if (zip_entry_name($zip_entry) != "word/document.xml")
            continue;

        $content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));

        zip_entry_close($zip_entry);
    }// end while

    zip_close($zip);

    $content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
    $content = str_replace('</w:r></w:p>', "\r\n", $content);
    $striped_content = strip_tags($content);

    return $striped_content;
}

This is the sample math equation in the docx file which i am trying to read and render to html page. thanks


回答1:


I fully go through this https://msdn.microsoft.com/en-us/library/aa982683(v=office.12).aspx#Office2007ManipulatingXMLDocs_exploring and parse the xml using php xmlreader()

$document = 'url';
/*Function to extract images*/ 
function readZippedImages($filename) 
{
    $for_image = $filename;
    /*Create a new ZIP archive object*/
    $zip = new ZipArchive;
    /*Open the received archive file*/
    $final_arr=array();
    $repo = array();
    if (true === $zip->open($filename)) 
    {
        for ($i=0; $i<$zip->numFiles;$i++) 
        {
            if($i==3)//should be document.xml
            {
                //======function using xml parser================================//
                $check = $zip->getFromIndex($i);
                //Create a new XMLReader Instance
                $reader = new XMLReader();
                //Loading from a XML File or URL
                //$reader->open($check);
                //Loading from PHP variable
                $reader->xml($check);

                //====================parsing through the document==================//
                while($reader->read())
                {
                $node_loc = $reader->localName;
                if($reader->nodeType == XMLREADER::ELEMENT && $reader->localName == 'body')
                {
                 $reader->read();
                 $read_content = $reader->value. "\n";
                }
                if($node_loc == '#text')//parsing all the text from document using #text tag
                {
                    $temp_value = array("text"=>$reader->value);
                    array_push($final_arr,$temp_value);
                    $reader->read();
                    $read_content = $reader->value. "\n";
                }
                 if($node_loc == 'blip')//parsing all the images using blip tag which is under drawing tag
                {
                    $attri_r = $reader->getAttribute("r:embed");
                    $current_image_name = $repo[$attri_r];
                    $image_stream = $this->showimage($for_image,$current_image_name);//return the base64 string
                    $temp_value = array("image"=>$image_stream);
                    array_push($final_arr,$temp_value);
                }
                }
                //==================xml parser end============================//
            }
            if($i==2)//should be rels.xml
            {
                $check_id = $zip->getFromIndex($i);
                $reader_relation = new XMLReader();
                $reader_relation->xml($check_id);

                //====================parsing through the document==================//
                while($reader_relation->read())
                {
                    $node_loc = $reader_relation->localName;
                    if($reader_relation->nodeType == XMLREADER::ELEMENT && $reader_relation->localName == 'Relationship')
                    {
                     $read_content_id = $reader_relation->getAttribute("Id");
                     $read_content_name = $reader_relation->getAttribute("Target");
                     $repo[$read_content_id]=$read_content_name;
                    }

                }
            }
        }
     }
}


function showimage($zip_file_original, $file_name_image) 
{
    $file_name_image = 'word/'.$file_name_image.'';// getting the image in the zip using its name
    $z_show = new ZipArchive();
    if ($z_show->open($zip_file_original) !== true) {
        echo "File not found.";
        return false;
    }

    $stat = $z_show->statName($file_name_image);
    $fp   = $z_show->getStream($file_name_image);
    if(!$fp) {
        echo "Could not load image.";
        return false;
    }

    header('Content-Type: image/jpeg');
    header('Content-Length: ' . $stat['size']);
    $image = stream_get_contents($fp);
    $picture = base64_encode($image);
    return $picture;//return the base62 string for the current image.
    fclose($fp);
}
readZippedImages($document);

print the $final_arr you will get the all text and images in the document.




回答2:


First of all it is a very bad idea to parse XML using a regular expression. Instead use PHP's XML parser that is designed to do this kind of tasks.

You need to read the specification for Open XML (standard that used by Microsoft Office) to learn about the internal data structure that Microsoft use for storing these kinds of math equation.



来源:https://stackoverflow.com/questions/29791764/how-to-read-docx-file-math-equations-using-php-code-igniter

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!