utf-8 | 易学教程

Converting Mac Roman character to equivalent UTF-8

阅读更多关于 Converting Mac Roman character to equivalent UTF-8

问题 I have been given some HTML files that use the Mac OS Roman file encoding. The files have French text, but in an editor many of the diacritical chars look strange (i.e. non French) Si cette option est sÈlectionnÈe, <removed> tentera de communiquer avec votre tÈlescope seulement ‡ líaide díun ... The capital E with accent does display properly in the browser as é as do the other strange characters. I also have some UTF-8 French files that look normal in an editor (é looks like é). What I'd

Python UTF-8 Latin-1 displays wrong character

阅读更多关于 Python UTF-8 Latin-1 displays wrong character

问题 I'm writing a very small script that can convert latin-1 characters into unicode (I'm a complete beginner in Python). I tried a method like this: def latin1_to_unicode(character): uni = character.decode('latin-1').encode("utf-8") retutn uni It works fine for characters that are not specific to the latin-1 set, but if I try the following example: print latin1_to_Unicode('å') It returns Ã¥ instead of å . Same goes for other letters like æ and ø . Can anyone please explain why this is happening?

Java array sort UTF-8

阅读更多关于 Java array sort UTF-8

问题 I want to sort an ArrayList<String> but the problem is my native language characters - my alphabet is like this: a, ą, b, c, č, d, e, f ... z, ž . As you see z character is second from the end and ą is second in alphabet, so after I sort my array it is sorted incorrectly. All my native language characters are moved to the end of array. Example: package lt; import java.util.ArrayList; import java.util.Collections; public class test { public static void main(String[] args) { List<String> items

Why Java char uses UTF-16?

阅读更多关于 Why Java char uses UTF-16?

问题 Recently I read lots of thing about unicode code points and how they evolved over time and sure I read http://www.joelonsoftware.com/articles/Unicode.html this also. But something I couldn't find the real reason why Java uses UTF-16 for a char. For example If I had the string which contains 1024 letter of ASCII scoped charachter string. It means 1024 * 2 bytes which equals to 2KB string memory it will consume in anyway. So if Java base char would be UTF-8 it would be just 1KB of data. Even if

Coding a path in unicode c++

阅读更多关于 Coding a path in unicode c++

问题 I had a problem with opening UTF-8 path files. Path that has a UTF-8 char (like Cyrillic or Latin). I found a way to solve that with _wfopen but the way a solved it was when I encode the UTF-8 char with UTF by hand (\Uxxxx). Is there a function, macro or anything that when I supply the string (path) it will return the Unicode?? Something like this: https://www.branah.com/unicode-converter I tried with MultiByteToWideChar but it returns some Hex numbers that are not relavent. Tried: std:

UTF-8 Character set CentOS PHP

阅读更多关于 UTF-8 Character set CentOS PHP

问题 I've a problem with the utf-8, I'm trying to show russian characters over my page, but I'm getting ???? instead of those russian characters. I tried to modify .htaccess like AddDefaultCharset UTF-8 but didn't work. I also observed that when I try to show characters using .html extension (no php) it works fine, but with the php file is shows ?????. server details php 5.3.16 centos release 6.3 I've similar problem as here http://remository.com/forum/func,view/id,18483/catid,24/ Thanks 回答1:

UTF-8 Character set CentOS PHP

阅读更多关于 UTF-8 Character set CentOS PHP

equivalent for mb_convert_encoding() in perl

阅读更多关于 equivalent for mb_convert_encoding() in perl

问题 I need to remove windows characters from a csv file before parsing into a database. These are characters like the "long hyphen" or "word inverted commas" In php I can remove with mb_convert_encoding(), How can I do the same in perl ? I need to remove only windows characters , not utf-8 characters 回答1: The from_to() function from Encode seems to be a pretty close match for mb_convert_encoding(). But it sounds like you have a file where some of it is encoded in CP1252 and some of it is in UTF8.

Text utf-8 in console works in eclipse, but fails to work with an exported jar

阅读更多关于 Text utf-8 in console works in eclipse, but fails to work with an exported jar

问题 I'm developping a java application, and I get some problems with Output, files, console, So about the files I figure out how to right using utf-8, but I can't figure out to set the encoding in the console. This is a simple test that I use it in my eclipse, then I generate a jar, and I execute it using a .bat file. public class Test { public static void main(String[] args) { System.out.println("Initialisée la procédure"); } } This is the result when I execute my Jar using this bat file: "C:

PHP File Get Contents & String Encoding

阅读更多关于 PHP File Get Contents & String Encoding

问题 Retrieved the contents of a css file: (http://gizmodo.com/assets/stylesheets/app-ecbc6044c59319aab4c2a1e31380ef56.css) Detected the encoding with mb_detect_encoding ... says UTF-8. Viewed the page in a browser, looks fine (readable), and declares @charset "UTF-8"; Tried to output the string, got garbage. Tried to save it to a file, got garbage. Tried to convert the encoding to ASCII, ISO-8859-1, and HTML-ENTITIES. No luck. Any ideas here how to determine why this string is garbage, and how to