Cut an UTF8 text in PHP

匆匆过客 提交于 2020-07-06 12:29:08

问题


I get UTF8 text from a database, and I want to show only the first $len characters (finishing in a word). I've tried several options but the function still doesn't work because of special characters (á, é, í, ó, etc).

Thanks for the help!

function text_limit($text, $len, $end='...')
{ 

  mb_internal_encoding('UTF-8');
  if( (mb_strlen($text, 'UTF-8') > $len) ) { 

    $text = mb_substr($text, 0, $len, 'UTF-8');
    $text = mb_substr($text, 0, mb_strrpos($text," ", 'UTF-8'), 'UTF-8');

    ...
  }
}

Edit to add an example

If I truncate a text with 65 characters, it returns:

Un jardín de estilo neoclásico acorde con el …

If I change the special characters (í, á), then it returns:

Un jardin de estilo neoclasico acorde con el Palacio de …

I'm sure there is something strange with the encoding or the server, or php; but I can't figure it out! Thanks!

Final Solution

I'm using this UTF8 PHP library and everything works now...


回答1:


use mb_substr. first arg the string to check second is the starting position the third is lenght and last is the encoding.

mb_substr ("String", 0, $len, 'utf-8');



回答2:


mb_strrpos($text," ", 'UTF-8')

You are not passing enough args to mb_strrpos() (you have omitted the offset - 3rd param, the encoding is the 4th param), try:

mb_strrpos($text," ", 0, 'UTF-8')

Although with the 2nd line omitted it, it looks OK, like you say... "I want to show only the first $len characters (finishing in a word)" - the 2nd line makes sure it finishes on a whole word?

EDIT: mb_substr() should be cutting at $len number of characters, not bytes. Are you sure the original text is actually UTF-8 and not some other encoding?




回答3:


Ok, so this has been baffling me that you can't get this to work because it should work just fine. Finally I think I have come up with the reason that this is not working for you.

What I think is going on here is that your browser is displaying in the wrong encoding and you are outputting utf-8 characters.

you have a couple options. First if you are displaying any of this as part of an html page check your meta tags to see if they are setting the character encoding.. If so change it to this:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

next if you are just outputting this directly to the browser use the header function to set the character encoding like so:

header("Content-type: text/html; charset=utf-8");

an easy test:

<?php
    header("Content-type: text/html; charset=utf-8");
    $text = "áéíó";
    echo mb_substr($text, 0, 3, 'utf-8');
?>

without this your browser will default to another encoding and display the text impropperly. Hopefully this helps you fix this issue, if not I'll keep trying :)




回答4:


How about trying mb_strcut(). Same params as mb_substr().




回答5:


This could be because your original solution truncated the string to 65 bytes, which normally would equate to 65 characters in an ASCII-only context, but becomes incorrect when UTF-8's multi-byte ranges are used. When truncating a string to 65 bytes - the string itself may be of variable length depending on the number of bytes in each character. That would also probably be dangerous as you could cut a character in half (splitting the multiple bytes).



来源:https://stackoverflow.com/questions/3294537/cut-an-utf8-text-in-php

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!