Google pages suggest you to minify HTML, that is, remove all the unnecessary spaces.
CodeIgniter does have the feature of giziping output or it can be done via .htacce
Sorry for not commenting, reputation missing ;)
I want to urge everybody not to implement such regex without checking for performance penalties. Shopware implemented the first regex (from Alan/ridgerunner) for their HTML minify and "blow up" every shop with bigger pages.
If possible, a combined solution (regex + some other logic) is most of the time faster and more maintainable (except you are Damian Conway) for complex problems.
Also i want to mention, that most minifier can break your code (JavaScript and HTML), when in a script-block itself is another script-block via document.write i.e.
Attached my solution (an optimized version off user2677898 snippet). I simplified the code and run some tests. Under PHP 7.2 my version was ~30% faster for my special testcase. Under PHP 7.3 and 7.4 the old variant gained much speed and is only ~10% slower. Also my version is still better maintainable due to less complex code.
function filterHtml($content) {
{
// List of untouchable HTML-tags.
$unchanged = 'script|pre|textarea';
// It is assumed that this placeholder could not appear organically in your
// output. If it can, you may have an XSS problem.
$placeholder = "@@<'-pLaChLdR-'>@@";
// Some helper variables.
$unchangedBlocks = [];
$unchangedRegex = "!<($unchanged)[^>]*?>.*?\\1>!is";
$placeholderRegex = "!$placeholder!";
// Replace all the tags (including their content) with a placeholder, and keep their contents for later.
$content = preg_replace_callback(
$unchangedRegex,
function ($match) use (&$unchangedBlocks, $placeholder) {
array_push($unchangedBlocks, $match[0]);
return $placeholder;
},
$content
);
// Remove HTML comments, but not SSI
$content = preg_replace('//s', '', $content);
// Remove whitespace (spaces, newlines and tabs)
$content = trim(preg_replace('/[ \n\t]{2,}|[\n\t]/m', ' ', $content));
// Replace the placeholders with the original content.
$content = preg_replace_callback(
$placeholderRegex,
function ($match) use (&$unchangedBlocks) {
// I am a paranoid.
if (count($unchangedBlocks) == 0) {
throw new \RuntimeException("Found too many placeholders in input string");
}
return array_shift($unchangedBlocks);
},
$content
);
return $content;
}