multibyte

Issue with utf-8 encoding using PHP + MySQL

旧城冷巷雨未停 提交于 2019-11-29 03:18:23
I moved data from MySQL 4 (they were originally set to latin2 encoding ) to MySQL 5 and set encoding to utf-8 . It looks good in phpMyAdmin , and utf-8 is okay. However there are question marks instead of some characters on website! The website encoding is also set to utf8 so I dont understand where the problem is. PHP and HTML files are also set to utf8 . I have no idea... Valentin Golev try query SET NAMES utf8 before any query in your application On my server, adding these to my php file had no effect: ini_set('default_charset','utf-8'); mysql_set_charset('utf8'); header('Content-type: text

What is a multibyte character set?

≡放荡痞女 提交于 2019-11-29 02:14:44
问题 Does the term multibyte refer to a charset whose characters can - but don't have to be - wider than 1 byte, (e.g. UTF-8) or does it refer to character sets which are in any case wider than 1 byte (e.g. UTF-16) ? In other words: What is meant if anybody talks about multibyte character sets? 回答1: The term is ambiguous, but in my internationalization work, we typically avoided the term "multibyte character sets" to refer to Unicode-based encodings. Generally, we used the term only for legacy

Detect chinese (multibyte) character in the string

一笑奈何 提交于 2019-11-28 21:43:56
$str = "This is a string containing 中文 characters. Some more characters - 中华人民共和国 "; How do I detect chinese characters from this string and print the part which starts with the first character and ends with "-"? (it would be "中文 characters. Some more characters -"). Thank you! I've solved this problem using preg_match and regular expressions: $str = "This is a string containing 中文 characters. Some more characters - 中华人民共和国 "; preg_match(/[\x{4e00}-\x{9fa5}]+.*\-/u, $str, $matches); Is PHP storing this as Unicode? If so, at worst you could step through the string, character by character, until

glob() can't find file names with multibyte characters on Windows?

青春壹個敷衍的年華 提交于 2019-11-28 18:38:39
I'm writing a file manager and need to scan directories and deal with renaming files that may have multibyte characters. I'm working on it locally on Windows/Apache PHP 5.3.8, with the following file names in a directory: filename.jpg имяфайла.jpg file件name.jpg פילענאַמע.jpg 文件名.jpg Testing on a live UNIX server woked fine. Testing locally on Windows using glob('./path/*') returns only the first one, filename.jpg . Using scandir() , the correct number of files is returned at least, but I get names like ?????????.jpg (note: those are regular question marks, not the � character. I'll end up

Invalid URI with Chinese characters (Java)

有些话、适合烂在心里 提交于 2019-11-28 12:44:41
Having trouble setting up a URL connection with Chinese characters in the URL. It works with Latin characters: String xstr = "维也纳恩斯特哈佩尔球场" ; URI uri = new URI("http","ajax.googleapis.com","/ajax/services/language/detect","v=1.0&q="+xstr,null); URL url = uri.toURL(); URLConnection connection = url.openConnection(); InputStream is = connection.getInputStream() ; The getInputStream() call results in: java.lang.IllegalArgumentException: Invalid uri 'http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=???????????': Invalid query The problem is caused by the fact that URI.toURL() doesn

UTF-8 characters don't display correctly

て烟熏妆下的殇ゞ 提交于 2019-11-28 12:07:14
This is my PHP code: <?php $result = ''; $str = 'Тугайный соловей'; for ($y=0; $y < strlen($str); $y++) { $tmp = mb_substr($str, $y, 1); $result = $result . $tmp; } echo 'result = ' . $result; The output is: Тугайный Ñоловей What can I do? I have to put $result into a MySQL database. What's the encoding of your file? It should be UTF8 too. What's the default charset of your http server? It should be UTF-8 as well. Encoding only works if: the file is encoded correctly the server tells what's the encoding of the delivered file. When working with databases, you also have to set the

multibyte strtr() -> mb_strtr()

柔情痞子 提交于 2019-11-28 06:33:05
Does anyone have written multibyte variant of function strtr() ? I need this one. Edit 1 (example of desired usage): Example: $from = 'ľľščťžýáíŕďňäô'; // these chars are in UTF-8 $to = 'llsctzyaiŕdnao'; // input - in UTF-8 $str = 'Kŕdeľ ďatľov učí koňa žrať kôru.'; $str = mb_strtr( $str, $from, $to ); // output - str without diacritic // $str = 'Krdel datlov uci kona zrat koru.'; I believe strtr is multi-byte safe , either way since str_replace is multi-byte safe you could wrap it: function mb_strtr($str, $from, $to) { return str_replace(mb_str_split($from), mb_str_split($to), $str); } Since

mb_detect_encoding detects ASCII as UTF-8?

社会主义新天地 提交于 2019-11-28 00:50:45
问题 I'm trying to automatically convert imported IPTC metadata from images to UTF-8 for storage in a database based on the PHP mb_ functions. Currently it looks like this: $val = mb_convert_encoding($val, 'UTF-8', mb_detect_encoding($val)); However, when mb_detect_encoding() is supplied an ASCII string (special characters in the Latin1-fields from 192-255) it detects it as UTF-8, hence in the following attempt to convert everything to proper UTF-8 all special characters are removed. I tried

UTF-8 characters don't display correctly

本秂侑毒 提交于 2019-11-27 19:22:50
问题 This is my PHP code: <?php $result = ''; $str = 'Тугайный соловей'; for ($y=0; $y < strlen($str); $y++) { $tmp = mb_substr($str, $y, 1); $result = $result . $tmp; } echo 'result = ' . $result; The output is: Тугайный Ñоловей What can I do? I have to put $result into a MySQL database. 回答1: What's the encoding of your file? It should be UTF8 too. What's the default charset of your http server? It should be UTF-8 as well. Encoding only works if: the file is encoded correctly the

How can I tell if a string contains multibyte characters in Javascript?

筅森魡賤 提交于 2019-11-27 18:02:01
Is it possible in Javascript to detect if a string contains multibyte characters? If so, is it possible to tell which ones? The problem I'm running into is this (apologies if the Unicode char doesn't show up right for you) s = "𝌆"; alert(s.length); // '2' alert(s.charAt(0)); // '��' alert(s.charAt(1)); // '��' Edit for a bit of clarity here (I hope) . As I understand it now , all strings in Javascript are represented as a series of UTF-16 code points, which means that regular characters actually take up 2 bytes (16 bits), so my usage of "multibyte" in the title was a bit off. Some characters