character-encoding | 易学教程

Why are results of path.toString() failing to show all characters on Linux but ok on windows

阅读更多关于 Why are results of path.toString() failing to show all characters on Linux but ok on windows

问题 In my Java code I use a FileVisitor to traverse a filesystem and creating a structure of Paths, then later on this is converted to a json object for rendering in html. Running on Windows it runs okay even against a linux filesystem, running on Linux against the same (now local) filesystem it fails to render special characters properly when call toString() on a path i.e Windows debug output CreateFolderTree:createJsonData:SEVERE: AddingNode(1):Duarte LÃ´bo- Requiem and html displays ok as

Recoding data.fame object from latin1 to utf-8

阅读更多关于 Recoding data.fame object from latin1 to utf-8

问题 I work with windows 7 (my system: "LC_COLLATE=French_France.1252) with data with accents. My data are coded in ANSI which allows me to visualize them correctly in the tabs of Rstudio. My problem: When I want to a create GoogleVis page (encoding utf-8), the accented characters are not displayed correctly. What I expected: I am looking to convert my latin1 Data.frames in utf-8 with R just before creating googleVis pages. I have no ideas. Stringi package seems only to work with raw data. fr <-

java print unicode characters to bash shell (mac OsX)

阅读更多关于 java print unicode characters to bash shell (mac OsX)

问题 I have this code in java 1.6: System.out.println("\u00b2"); but on bash on OSX10.6 I get question marks and not the unicode characters... actually I want to print the characters 176,177,178 on the extended ascii code (look here http://www.asciitable.com/) to create some art on the bash terminal.. any idea? thanks 回答1: The following code works for me in UTF-8 enabled Terminal.app on Mac OS X 10.6.7: # code taken from: # "Print Unicode characters to the Terminal with Java", # http://hints

How to check if a char lies between a certain unicode range…?

阅读更多关于 How to check if a char lies between a certain unicode range…?

问题 I want to check if a particular char I get from the text field lies between a particular hex range of unicode character set... Like if I enter capital C then I will specify the range 41-5a.. I want to do this for russian alphabet. But cant figure it out.I can get the last char entered using.. unichar lastEnteredChar= [[textField.text stringByReplacingCharactersInRange:range withString:string] characterAtIndex:[[textField.text stringByReplacingCharactersInRange:range withString:string] length]

Why does the string “¿” get translated to “Â¿” when calling .getBytes()

阅读更多关于 Why does the string “¿” get translated to “Â¿” when calling .getBytes()

问题 When writing the string "¿" out using System.out.println(new String("¿".getBytes("UTF-8"))); Â¿ is written instead of just ¿. WHY? And how do we fix it? 回答1: You don't have to use UTF-16 to solve this: new String("¿".getBytes("UTF-8"), "UTF-8"); works just fine. As long as the encoding given to the getBytes() method is the same as the encoding you pass to the String constructor, you should be fine! 回答2: You need to specify the Charset in the String constructor (see the API docs). 回答3: Try:

dompdf special character showing question mark?

阅读更多关于 dompdf special character showing question mark?

问题 I have used dompdf 0.5.1 for generating PDF files. But the special characters are not properly showing. For example, . It is showing something like â€“ â€œ in the generated PDF file. I used UTF-8 encoding like <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> in the HTML page which is rendered by the dompdf. I also have used the encoding before sending it to dompdf, like $dompdf->load_html(utf8_decode($html)); . But I get ? marks instead of the above characters. How do I

Scala - Converting from ISO-8859-1 to UTF-8 gives foreign character strangeness

阅读更多关于 Scala - Converting from ISO-8859-1 to UTF-8 gives foreign character strangeness

问题 Here's my problem; I have an InputStream that I've converted to a byte array, but I don't know the character set of the InputStream at runtime. My original thought was to do everything in UTF-8, but I see strange issues with streams that are encoded as ISO-8859-1 and have foreign characters. (Those crazy Swedes) Here's the code in question: IOUtils.toString(inputstream, "utf-8") // Fails on iso8859-1 foreign characters To simulate this, I have: new String("\u00F6") // Returns ö as expected,

Get source code with Chinese characters PHP

阅读更多关于 Get source code with Chinese characters PHP

问题 Well, I give up. I've been messing around with all I could think of to retrieve data from a target website that has information in traditional Chinese encoding (charset=GB2312). I've been using the simple_html_parser like always but it doesn't seem to return the Chinese characters, in fact all I get are some weird question marks embedded inside a rhomboid shape. ("��ѯ�ؼ��֣�" Like so) Declaring the encoding for the php file didn't do anything except of getting rid of some unwanted

Convert ISO/Windows charsets to UTF-8 in Javascript

阅读更多关于 Convert ISO/Windows charsets to UTF-8 in Javascript

问题 I'm developing a firefox plugin and i fetch web pages to do some analysis for the user. The problem is when i try to get (XMLHttpRequest) pages that are not utf-8 encoded the string i see is messed up. For example hebrew pages with windows-1125 or Chinese pages with gb2312. I already tried the following: var uDecoder=Components.classes["@mozilla.org/intl/scriptableunicodeconverter"].getService(Components.interfaces.nsIScriptableUnicodeConverter); uDecoder.charset="windows-1255"; alert( xhr

Can't get a degree symbol into raw_input

阅读更多关于 Can't get a degree symbol into raw_input

问题 The problem in my code looks something like this: #!/usr/bin/python # -*- coding: UTF-8 -*- deg = u'°' print deg print '40%s N, 100%s W' % (deg, deg) codelim = raw_input('40%s N, 100%s W)? ' % (deg, deg)) I'm trying to generate a raw_input prompt for delimiter characters inside a latitude/longitude string, and the prompt should include an example of such a string. print deg and print '40%s N, 100%s W' % (deg, deg) both work fine -- they return "°" and "40° N, 100° W" respectively -- but the