unicode-string | 易学教程

iPhone: Convert Unicode to string

阅读更多关于 iPhone: Convert Unicode to string

I need to convert the following to string and display Overall, the \u2018\u2018typical\u2019\u2019 xyz is broadly expressed I have tried all sort of uncode conversion NSData *asciiData = [desc dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES]; NSString *encodedString = [[NSString alloc] initWithData:asciiData encoding:NSASCIIStringEncoding and: [NSString stringByReplacingOccurrencesOfString:@"\u2018" withString:@""] without success. Kindly suggest me a solution to this. char cString[] = "\u2018\u2018typical\u2019\u2019"; NSString *string = [NSString stringWithCString:cString

Unicode File Writing and Reading in C++?

阅读更多关于 Unicode File Writing and Reading in C++?

Can anyone Provide a Simple Example to Read and Write in the Unicode File a Unicode Character ? On linux I use the iconv (link) library which is very standard. An overly simple program is: #include <stdio.h> #include <stdlib.h> #include <iconv.h> #define BUF_SZ 1024 int main( int argc, char* argv[] ) { char bin[BUF_SZ]; char bout[BUF_SZ]; char* inp; char* outp; ssize_t bytes_in; size_t bytes_out; size_t conv_res; if( argc != 3 ) { fprintf( stderr, "usage: convert from to\n" ); return 1; } iconv_t conv = iconv_open( argv[2], argv[1] ); if( conv == (iconv_t)(-1) ) { fprintf( stderr, "Cannot

Displaying a unicode text in C#

阅读更多关于 Displaying a unicode text in C#

问题 My App displays English, Japanese and Chinese characters on a TextBox and a LinkLabel. Currently, I check if there are unicode characters and change the font to MS Mincho or else leave it in Tahoma. Now MS Mincho displays Japanese properly, but for Chinese I have to use Sim Sun. How can I distinguish between the two? How can I ensure that unicode text are displayed properly regardless of the font/language? 回答1: If you have unicode characters for each of the text, using a font that supports

How to replace \\\\u by \\u in Java String

阅读更多关于 How to replace \\\\u by \\u in Java String

I have a string of the format: "aaa\\u2022bbb\\u2014ccc" I'd like to display the two special charactes, but to be able to do that, I have to first convert the string to this format: "aaa\u2022bbb\u2014ccc" I've tried writing this, but it gives a compilation error: String encodedInput = input.replace("\\u", "\u"); This has got to be something straightforward, but I just cannot get it. Any ideas? Unfortunately I do not know of a sort of eval. String s = "aaa\\u2022bbb\\u2014ccc"; StringBuffer buf = new StringBuffer(); Matcher m = Pattern.compile("\\\\u([0-9A-Fa-f]{4})").matcher(s); while (m.find

C: Most efficient way to determine how many bytes will be needed for a UTF-16 string from a UTF-8 string

阅读更多关于 C: Most efficient way to determine how many bytes will be needed for a UTF-16 string from a UTF-8 string

问题 I've seen some very clever code out there for converting between Unicode codepoints and UTF-8 so I was wondering if anybody has (or would enjoy devising) this. Given a UTF-8 string, how many bytes are needed for the UTF-16 encoding of the same string. Assume the UTF-8 string has already been validated. It has no BOM, no overlong sequences, no invalid sequences, is null-terminated. It is not CESU-8. Full UTF-16 with surrogates must be supported. Specifically I wonder if there are shortcuts to

Importing foreign languages from csv file to Stata

阅读更多关于 Importing foreign languages from csv file to Stata

I am using Stata 12. I have encountered the following problems. I am importing a bunch of .csv files to Stata using the insheet command. The datasets may conclude Russian, Croatian, Turkish, etc. I think they are encoded in "UTF-8". In .csv files, they are correct. After I imported them into Stata, the original strings are incorrect and become the strange characters. Would you please help me with that? Does Stat-Transfer can solve the problems? Does it support .csv format? For example, the original file is like: My code is like: insheet using name.csv, c n save name.dta,replace The result is

Converting Unicode objects with non-ASCII symbols in them into strings objects (in Python)

阅读更多关于 Converting Unicode objects with non-ASCII symbols in them into strings objects (in Python)

I want to send Chinese characters to be translated by an online service, and have the resulting English string returned. I'm using simple JSON and urllib for this. And yes, I am declaring. # -*- coding: utf-8 -*- on top of my code. Now everything works fine if I feed urllib a string type object, even if that object contains what would be Unicode information. My function is called translate . For example: stringtest1 = '無與倫比的美麗' print translate(stringtest1) results in the proper translation and doing type(stringtest1) confirms this to be a string object. But if do stringtest1 = u'無與倫比的美麗' and

Python escape sequence \\N{name} not working as per definition

阅读更多关于 Python escape sequence \\N{name} not working as per definition

I am trying to print unicode characters given their name as follows: # -*- coding: utf-8 -*- print "\N{SOLIDUS}" print "\N{BLACK SPADE SUIT}" However the output I get is not very encouraging. The escape sequence is printed as is. ActivePython 2.7.2.5 (ActiveState Software Inc.) based on Python 2.7.2 (default, Jun 24 2011, 12:21:10) [MSC v.1500 32 bit (Intel)] on Type "help", "copyright", "credits" or "license" for more information. >>> # -*- coding: utf-8 -*- ... print "\N{SOLIDUS}" \N{SOLIDUS} >>> print "\N{BLACK SPADE SUIT}" \N{BLACK SPADE SUIT} >>> I can however see that another asker has

Displaying a unicode text in C#

阅读更多关于 Displaying a unicode text in C#

My App displays English, Japanese and Chinese characters on a TextBox and a LinkLabel. Currently, I check if there are unicode characters and change the font to MS Mincho or else leave it in Tahoma. Now MS Mincho displays Japanese properly, but for Chinese I have to use Sim Sun. How can I distinguish between the two? How can I ensure that unicode text are displayed properly regardless of the font/language? If you have unicode characters for each of the text, using a font that supports unicode should cover it properly for you (e.g. Arial Unicode MS). You can't ensure that unicode text is

Get “actual” length of string in Unicode characters

阅读更多关于 Get “actual” length of string in Unicode characters

问题 given a character like " ✮ " ( \xe2\x9c\xae ), for example, can be others like " Σ ", " д " or " Λ ") I want to find the "actual" length that character takes when printed onscreen for example len("✮") len("\xe2\x9c\xae") both return 3, but it should be 1 回答1: You may try like this: unicodedata.normalize('NFC', u'✮') len(u"✮") UTF-8 is an unicode encoding which uses more than one byte for special characters. Check unicodedata.normalize() 回答2: My answer to a similar question: You are looking