Minifying final HTML output using regular expressions with CodeIgniter

后端 未结 3 1952
忘掉有多难
忘掉有多难 2020-12-04 13:56

Google pages suggest you to minify HTML, that is, remove all the unnecessary spaces. CodeIgniter does have the feature of giziping output or it can be done via .htacce

3条回答
  •  无人及你
    2020-12-04 14:22

    Sorry for not commenting, reputation missing ;)

    I want to urge everybody not to implement such regex without checking for performance penalties. Shopware implemented the first regex (from Alan/ridgerunner) for their HTML minify and "blow up" every shop with bigger pages.

    If possible, a combined solution (regex + some other logic) is most of the time faster and more maintainable (except you are Damian Conway) for complex problems.

    Also i want to mention, that most minifier can break your code (JavaScript and HTML), when in a script-block itself is another script-block via document.write i.e.

    Attached my solution (an optimized version off user2677898 snippet). I simplified the code and run some tests. Under PHP 7.2 my version was ~30% faster for my special testcase. Under PHP 7.3 and 7.4 the old variant gained much speed and is only ~10% slower. Also my version is still better maintainable due to less complex code.

    function filterHtml($content) {
    {
        // List of untouchable HTML-tags.
        $unchanged = 'script|pre|textarea';
    
        // It is assumed that this placeholder could not appear organically in your
        // output. If it can, you may have an XSS problem.
        $placeholder = "@@<'-pLaChLdR-'>@@";
    
        // Some helper variables.
        $unchangedBlocks  = [];
        $unchangedRegex   = "!<($unchanged)[^>]*?>.*?!is";
        $placeholderRegex = "!$placeholder!";
    
        // Replace all the tags (including their content) with a placeholder, and keep their contents for later.
        $content = preg_replace_callback(
            $unchangedRegex,
            function ($match) use (&$unchangedBlocks, $placeholder) {
                array_push($unchangedBlocks, $match[0]);
                return $placeholder;
            },
            $content
        );
    
        // Remove HTML comments, but not SSI
        $content = preg_replace('//s', '', $content);
    
        // Remove whitespace (spaces, newlines and tabs)
        $content = trim(preg_replace('/[ \n\t]{2,}|[\n\t]/m', ' ', $content));
    
        // Replace the placeholders with the original content.
        $content = preg_replace_callback(
            $placeholderRegex,
            function ($match) use (&$unchangedBlocks) {
                // I am a paranoid.
                if (count($unchangedBlocks) == 0) {
                    throw new \RuntimeException("Found too many placeholders in input string");
                }
                return array_shift($unchangedBlocks);
            },
            $content
        );
    
        return $content;
    }
    

提交回复
热议问题