Limit input length of text that contains HTML tags

人走茶凉 提交于 2019-11-29 16:14:37

html_entity_decode only decodes HTML entities, it doesn't ignore HTML tags. Try:

strlen(strip_tags(html_entity_decode($string)));

Or the multi-byte equivalent:

mb_strlen(strip_tags(html_entity_decode($string)), 'auto');

You want to get the number of characters, but you don't want to count HTML markup.

You can do that by using a HTML parser, like DOMDocument. You load in the document (or fragment), obtain the body tag which represents the documents content, get it's nodeValue, normalize the whitespace of it and then you use a UTF-8 compatible character counting function:

$doc = new DOMDocument();
$doc->loadHTMLFile('test.html');
$body = $doc->getElementsByTagName('body')->item(0);
$text = $body->nodeValue;
$text = trim(preg_replace('/\s{1,}/u', ' ', $text));
printf("Length: %d character(s).\n", mb_strlen($text, 'utf-8'));

Example input test.html:

<body>
    <div style='float:left'><img src='../../../../includes/ph1.jpg'></div>

    <label style='width: 476px; height: 40px; position: absolute;top:100px; left: 40px; z-index: 2; background-color: rgb(255, 255, 255);; background-color: transparent' >
    <font size="4">1a. Nice to meet you!</font>
    </label>
    <img src='ENG_L1_C1_P0_1.jpg' style='width: 700px; height: 540px; position: absolute;top:140px; left: 40px; z-index: 1;' />

    <script type='text/javascript'> 


    swfobject.registerObject('FlashID');
    </script>

    <input type="image" id="nextPageBtn" src="../../../../includes/ph4.gif" style="position: absolute; top: 40px; left: 795px; ">

</body>

Example output:

Length: 58 character(s).

The normalized text is:

1a. Nice to meet you! swfobject.registerObject('FlashID');

Take care that this counts the text-size including things like text inside <script> tags.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!