I have a php web sites in wich I can manages articles. On the Add a new article form, there is a rich-text-box (allows HTML input) that I\'d like to limit the character inpu
html_entity_decode only decodes HTML entities, it doesn't ignore HTML tags. Try:
strlen(strip_tags(html_entity_decode($string)));
Or the multi-byte equivalent:
mb_strlen(strip_tags(html_entity_decode($string)), 'auto');
You want to get the number of characters, but you don't want to count HTML markup.
You can do that by using a HTML parser, like DOMDocument. You load in the document (or fragment), obtain the body tag which represents the documents content, get it's nodeValue, normalize the whitespace of it and then you use a UTF-8 compatible character counting function:
$doc = new DOMDocument();
$doc->loadHTMLFile('test.html');
$body = $doc->getElementsByTagName('body')->item(0);
$text = $body->nodeValue;
$text = trim(preg_replace('/\s{1,}/u', ' ', $text));
printf("Length: %d character(s).\n", mb_strlen($text, 'utf-8'));
Example input test.html:
<body>
<div style='float:left'><img src='../../../../includes/ph1.jpg'></div>
<label style='width: 476px; height: 40px; position: absolute;top:100px; left: 40px; z-index: 2; background-color: rgb(255, 255, 255);; background-color: transparent' >
<font size="4">1a. Nice to meet you!</font>
</label>
<img src='ENG_L1_C1_P0_1.jpg' style='width: 700px; height: 540px; position: absolute;top:140px; left: 40px; z-index: 1;' />
<script type='text/javascript'>
swfobject.registerObject('FlashID');
</script>
<input type="image" id="nextPageBtn" src="../../../../includes/ph4.gif" style="position: absolute; top: 40px; left: 795px; ">
</body>
Example output:
Length: 58 character(s).
The normalized text is:
1a. Nice to meet you! swfobject.registerObject('FlashID');
Take care that this counts the text-size including things like text inside <script> tags.