Using PHP substr() and strip_tags() while retaining formatting and without breaking HTML

前端 未结 10 2119
Happy的楠姐
Happy的楠姐 2020-11-27 16:44

I have various HTML strings to cut to 100 characters (of the stripped content, not the original) without stripping tags and without breaking HTML.

Original H

10条回答
  •  伪装坚强ぢ
    2020-11-27 17:33

    I made another function to do it, it supports UTF-8:

    /**
     * Limit string without break html tags.
     * Supports UTF8
     * 
     * @param string $value
     * @param int $limit Default 100
     */
    function str_limit_html($value, $limit = 100)
    {
    
        if (mb_strwidth($value, 'UTF-8') <= $limit) {
            return $value;
        }
    
        // Strip text with HTML tags, sum html len tags too.
        // Is there another way to do it?
        do {
            $len          = mb_strwidth($value, 'UTF-8');
            $len_stripped = mb_strwidth(strip_tags($value), 'UTF-8');
            $len_tags     = $len - $len_stripped;
    
            $value = mb_strimwidth($value, 0, $limit + $len_tags, '', 'UTF-8');
        } while ($len_stripped > $limit);
    
        // Load as HTML ignoring errors
        $dom = new DOMDocument();
        @$dom->loadHTML(''.$value, LIBXML_HTML_NODEFDTD);
    
        // Fix the html errors
        $value = $dom->saveHtml($dom->getElementsByTagName('body')->item(0));
    
        // Remove body tag
        $value = mb_strimwidth($value, 6, mb_strwidth($value, 'UTF-8') - 13, '', 'UTF-8'); //  and 
        // Remove empty tags
        return preg_replace('/<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|"[^"]*"|[\w\-.:]+))?)*\s*\/?>\s*<\/\1\s*>/', '', $value);
    }
    

    SEE DEMO.

    I recommend use html_entity_decode at the start of function, so it preserves the UTF-8 characters:

     $value = html_entity_decode($value);
    

提交回复
热议问题