Can PHP read text from a PowerPoint file?

前端 未结 4 1530
终归单人心
终归单人心 2020-12-01 22:48

I want to have PHP read an (uploaded) powerpoint presentation, and minimally extract the text from each slide (grabbing more info like images and layouts would even be bette

4条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-12-01 23:39

    Here's a sample function I created form a similar one that extracts text from Word documents. I tested it with Microsoft PowerPoint files, but it won't decode OpenOfficeImpress files saved as .ppt

    For .pptx files you might want to take a look at Zend Lucene.

        function parsePPT($filename) {
        // This approach uses detection of the string "chr(0f).Hex_value.chr(0x00).chr(0x00).chr(0x00)" to find text strings, which are then terminated by another NUL chr(0x00). [1] Get text between delimiters [2] 
        $fileHandle = fopen($filename, "r");
        $line = @fread($fileHandle, filesize($filename));
        $lines = explode(chr(0x0f),$line);
        $outtext = '';
    
        foreach($lines as $thisline) {
            if (strpos($thisline, chr(0x00).chr(0x00).chr(0x00)) == 1) {
                $text_line = substr($thisline, 4);
                $end_pos   = strpos($text_line, chr(0x00));
                $text_line = substr($text_line, 0, $end_pos);
                $text_line = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t@\/\_\(\)]/","",$text_line);
                if (strlen($text_line) > 1) {
                    $outtext.= substr($text_line, 0, $end_pos)."\n";
                }
            }
        }
        return $outtext;
    }
    

提交回复
热议问题