Using PHP substr() and strip_tags() while retaining formatting and without breaking HTML

前端未结

关注

 10  2082

I have various HTML strings to cut to 100 characters (of the stripped content, not the original) without stripping tags and without breaking HTML.

Original H

相关标签:

10条回答

迷失自我

2020-11-27 17:36

Here is my try at the cutter. Maybe you guys can catch some bugs. The problem, i found with the other parsers, is that they don't close tags properly and they cut in the middle of a word (blah)

function cutHTML($string, $length, $patternsReplace = false) { $i = 0; $count = 0; $isParagraphCut = false; $htmlOpen = false; $openTag = false; $tagsStack = array(); while ($i < strlen($string)) { $char = substr($string, $i, 1); if ($count >= $length) { $isParagraphCut = true; break; } if ($htmlOpen) { if ($char === ">") { $htmlOpen = false; } } else { if ($char === "<") { $j = $i; $char = substr($string, $j, 1); while ($j < strlen($string)) { if($char === '/'){ $i++; break; } elseif ($char === ' ') { $tagsStack[] = substr($string, $i, $j); } $j++; } $htmlOpen = true; } } if (!$htmlOpen && $char != ">") { $count++; } $i++; } if ($isParagraphCut) { $j = $i; while ($j > 0) { $char = substr($string, $j, 1); if ($char === " " || $char === ";" || $char === "." || $char === "," || $char === "<" || $char === "(" || $char === "[") { break; } else if ($char === ">") { $j++; break; } $j--; } $string = substr($string, 0, $j); foreach($tagsStack as $tag){ $tag = strtolower($tag); if($tag !== "img" && $tag !== "br"){ $string .= "</$tag>"; } } $string .= "..."; } if ($patternsReplace) { foreach ($patternsReplace as $value) { if (isset($value['pattern']) && isset($value["replace"])) { $string = preg_replace($value["pattern"], $value["replace"], $string); } } } return $string; }

0 讨论(0)

发布评论:

提交评论

加载中...

情深已故

2020-11-27 17:43

Use PHP's DOMDocument class to normalize an HTML fragment:

$dom= new DOMDocument(); $dom->loadHTML('<div><p>Hello World'); $xpath = new DOMXPath($dom); $body = $xpath->query('/html/body'); echo($dom->saveXml($body->item(0)));

This question is similar to an earlier question and I've copied and pasted one solution here. If the HTML is submitted by users you'll also need to filter out potential Javascript attack vectors like onmouseover="do_something_evil()" or <a href="javascript:more_evil();">...</a>. Tools like HTML Purifier were designed to catch and solve these problems and are far more comprehensive than any code that I could post.

0 讨论(0)

发布评论:

提交评论

加载中...

忘掉有多难

2020-11-27 17:43

Regardless of the 100 count issues you state at the beginning, you indicate in the challenge the following:

output the character count of strip_tags (the number of characters in the actual displayed text of the HTML)

retain HTML formatting close

any unfinished HTML tag

Here is my proposal: Bascially, I parse through each character counting as I go. I make sure NOT to count any characters in any HTML tag. I also check at the end to make sure I am not in the middle of a word when I stop. Once I stop, I back track to the first available SPACE or > as a stopping point.

$position = 0; $length = strlen($content)-1; // process the content putting each 100 character section into an array while($position < $length) { $next_position = get_position($content, $position, 100); $data[] = substr($content, $position, $next_position); $position = $next_position; } // show the array print_r($data); function get_position($content, $position, $chars = 100) { $count = 0; // count to 100 characters skipping over all of the HTML while($count <> $chars){ $char = substr($content, $position, 1); if($char == '<'){ do{ $position++; $char = substr($content, $position, 1); } while($char !== '>'); $position++; $char = substr($content, $position, 1); } $count++; $position++; } echo $count."\n"; // find out where there is a logical break before 100 characters $data = substr($content, 0, $position); $space = strrpos($data, " "); $tag = strrpos($data, ">"); // return the position of the logical break if($space > $tag) { return $space; } else { return $tag; } }

This will also count the return codes etc. Considering they will take space, I have not removed them.

0 讨论(0)

发布评论:

提交评论

加载中...

名媛妹妹

2020-11-27 17:46

You should use Tidy HTML. You cut the string and then you run Tidy to close the tags.

(Credits where credits are due)

0 讨论(0)

发布评论:

提交评论

加载中...

上一页 1 2

验证码

看不清?

提交回复