multibyte-functions | 易学教程

How to get correct list position in multi-byte string using preg_match

阅读更多关于 How to get correct list position in multi-byte string using preg_match

问题 I am currently matching HTML using this code: preg_match('/<\/?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;/u', $html, $match, PREG_OFFSET_CAPTURE, $position) It matches everything perfect, however if I have a multibyte character, it counts it as 2 characters when giving back the position. For example the returned $match array would give something like: array 0 => array 0 => string '<br />' (length=6) 1 => int 132 1 => array 0 => string 'br' (length=2) 1 => int 133 The real number for the <br /> match is

Is it safe to use `strstr` to search for multibyte UTF-8 characters in a string?

阅读更多关于 Is it safe to use `strstr` to search for multibyte UTF-8 characters in a string?

问题 Following my previous question: Why `strchr` seems to work with multibyte characters, despite man page disclaimer?, I figured out that strchr was a bad choice. Instead I am thinking about using strstr to look for a single character (multi-byte not char ): const char str[] = "This string contains é which is a multi-byte character"; char * pos = strstr(str, "é"); // 'é' = 0xC3A9: 2 bytes printf("%s\n", pos); Ouput: é which is a multi-byte character Which is what I expect: the position of the

php sprintf() with foreign characters?

阅读更多关于 php sprintf() with foreign characters?

问题 Seams to be like sprintf have a problem with foregin characters? Or is it me doing something wrong? Looks like it work when removing chars like åäö from the string though. Should that be necessary? I want the following lines to be aligned correctly for a report: 2011-11-27 A1823 -Ref. Leif - 12 873,00 18.98 2011-11-30 A1856 -Rättat xx - 6 594,00 19.18 I'm using sprintf() like this: %-12s %-8s -%-10s -%20s %8.2f Using: php-5.3.23-nts-Win32-VC9-x86 回答1: Strings in PHP are basically arrays of

How to handle multibyte string in Python

阅读更多关于 How to handle multibyte string in Python

问题 There are multibyte string functions in PHP to handle multibyte string (e.g:CJK script). For example, I want to count how many letters in a multi bytes string by using len function in python, but it return an inaccurate result (i.e number of bytes in this string) japanese = "桜の花びらたち" print japanese print len(japanese)#return 21 instead of 7 Is there any package or function like mb_strlen in PHP? 回答1: Use Unicode strings: # Encoding: UTF-8 japanese = u"桜の花びらたち" print japanese print len

PHP Multi Byte str_replace?

阅读更多关于 PHP Multi Byte str_replace?

问题 I'm trying to do accented character replacement in PHP but get funky results, my guess being because i'm using a UTF-8 string and str_replace can't properly handle multi-byte strings.. $accents_search = array('á','à','â','ã','ª','ä','å','Á','À','Â','Ã','Ä','é','è', 'ê','ë','É','È','Ê','Ë','í','ì','î','ï','Í','Ì','Î','Ï','œ','ò','ó','ô','õ','º','ø', 'Ø','Ó','Ò','Ô','Õ','ú','ù','û','Ú','Ù','Û','ç','Ç','Ñ','ñ'); $accents_replace = array('a','a','a','a','a','a','a','A','A','A','A','A','e','e', 'e

Character Encoding UTF8 Issue when using mb_detect_encoding() with PHP

阅读更多关于 Character Encoding UTF8 Issue when using mb_detect_encoding() with PHP

问题 I am reading an rss feed http://beersandbeans.com/feed/ The feeds says it is UTF8 format, and I am using simplepie rss to import the content When i grab the content and store it in $content I perform the following: <?php header ('Content-type: text/html; charset=utf-8'); ?> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> </head><body> <?php echo $content; echo $enc = mb_detect_encoding($content, "UTF-8,ISO

multi-byte function to replace preg_match_all?

阅读更多关于 multi-byte function to replace preg_match_all?

问题 I'm looking for a multi-byte function to replace preg_match_all() . I need one that will give me an array of matched strings, like the $matches argument from preg_match() . The function mb_ereg_match() doesn't seem to do it -- it only gives me a boolean indicating if there were any matches. Looking at the mb_* functions page, I don't offhand see anythng that replaces the functionality of preg_match() . What do I use? Edit I'm an idiot. I originally posted this question asking for a

Character Encoding UTF8 Issue when using mb_detect_encoding() with PHP

阅读更多关于 Character Encoding UTF8 Issue when using mb_detect_encoding() with PHP

I am reading an rss feed http://beersandbeans.com/feed/ The feeds says it is UTF8 format, and I am using simplepie rss to import the content When i grab the content and store it in $content I perform the following: <?php header ('Content-type: text/html; charset=utf-8'); ?> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> </head><body> <?php echo $content; echo $enc = mb_detect_encoding($content, "UTF-8,ISO-8859-1", true); echo $content = mb_convert_encoding($content, "UTF-8", $enc); echo $enc = mb_detect

php sprintf() with foreign characters?

阅读更多关于 php sprintf() with foreign characters?

Seams to be like sprintf have a problem with foregin characters? Or is it me doing something wrong? Looks like it work when removing chars like åäö from the string though. Should that be necessary? I want the following lines to be aligned correctly for a report: 2011-11-27 A1823 -Ref. Leif - 12 873,00 18.98 2011-11-30 A1856 -Rättat xx - 6 594,00 19.18 I'm using sprintf() like this: %-12s %-8s -%-10s -%20s %8.2f Using: php-5.3.23-nts-Win32-VC9-x86 Strings in PHP are basically arrays of bytes (not characters). They cannot work natively with multibyte encodings (such as UTF-8). For details see:

How to handle multibyte string in Python

阅读更多关于 How to handle multibyte string in Python

There are multibyte string functions in PHP to handle multibyte string (e.g:CJK script). For example, I want to count how many letters in a multi bytes string by using len function in python, but it return an inaccurate result (i.e number of bytes in this string) japanese = "桜の花びらたち" print japanese print len(japanese)#return 21 instead of 7 Is there any package or function like mb_strlen in PHP? Use Unicode strings : # Encoding: UTF-8 japanese = u"桜の花びらたち" print japanese print len(japanese) Note the u in front of the string. To convert a bytestring into Unicode, use decode : "桜の花びらたち".decode(