unicode-string | 易学教程

Unicode file in notepad [closed]

阅读更多关于 Unicode file in notepad [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . What does it mean when I save a text file as "Unicode" in notepad? is it Utf-8, Utf-16 or Utf-32? Thanks in advance. 回答1: In Notepad, as in Windows software in general, “Unicode” as an encoding name means UTF-16 Little Endian (UTF-16LE). (I first thought it’s not real UTF-16, because Notepad++ recognizes it as

Python 3: os.walk() file paths UnicodeEncodeError: 'utf-8' codec can't encode: surrogates not allowed

阅读更多关于 Python 3: os.walk() file paths UnicodeEncodeError: 'utf-8' codec can't encode: surrogates not allowed

问题 This code: for root, dirs, files in os.walk('.'): print(root) Gives me this error: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc3' in position 27: surrogates not allowed How do I walk through a file tree without getting toxic strings like this? 回答1: On Linux, filenames are 'just a bunch of bytes', and are not necessarily encoded in a particular encoding. Python 3 tries to turn everything into Unicode strings. In doing so the developers came up with a scheme to translate byte

iOS Localization: Unicode character escape sequences, which have the form '\uxxxx' does not work

阅读更多关于 iOS Localization: Unicode character escape sequences, which have the form '\uxxxx' does not work

问题 We have key-value pair in Localization.string file. "spanish-key" = "Espa\u00f1ol"; When we fetch and assign to label then app displays it as "Espau00f1ol". Doesn't work. self.label1.text= NSLocalizedString(@"spanish-key", nil); It works- shows in required format. self.label1.text= @"Espa\u00f1ol"; What could be the problem here when we use NSLocalizedString(@"spanish-key", nil)? If we set \U instead of \u, then it works. "spanish-key" = "Espa\U00f1ol"; When to use "\Uxxxx" and "\uxxxx"? 回答1:

Convert Unicode character to NSString

阅读更多关于 Convert Unicode character to NSString

问题 I have received string from webservice which contains Unicode character. I want to convert that To plain NSString. so How can i do that? ex: "This isn\u0092t your bike" So how can remove unicode and replace it with its equal special symbol characted. The output would be : "This isn't your bike" 回答1: char cString[] = "This isn\u2019t your bike"; NSData *data = [NSData dataWithBytes:cString length:strlen(cString)]; NSString *string = [[NSString alloc] initWithData:data encoding

Convert between string, u16string & u32string

阅读更多关于 Convert between string, u16string & u32string

I've been looking for a way to convert between the Unicode string types and came across this method . Not only do I not completely understand the method (there are no comments) but also the article implies that in future there will be better methods. If this is the best method, could you please point out what makes it work, and if not I would like to hear suggestions for better methods. bames53 mbstowcs() and wcstombs() don't necessarily convert to UTF-16 or UTF-32, they convert to wchar_t and whatever the locale wchar_t encoding is. All Windows locales uses a two byte wchar_t and UTF-16 as

Regex for a (twitter-like) hashtag that allows non-ASCII characters

阅读更多关于 Regex for a (twitter-like) hashtag that allows non-ASCII characters

问题 I want a regex to match a simple hashtag like that in twitter (e.g. #someword). I want it also to recognize non standard characters (like those in Spanish, Hebrew or Chinese). This was my initial regex: (^|\s|\b)(#(\w+))\b --> but it doesn't recognize non standard characters. Then, I tried using XRegExp.js, which worked, but ran too slowly. Any suggestions for how to do it? 回答1: Eventually I found this: twitter-text.js useful link, which is basically how twitter solve this problem. 回答2: With

How I can print the wchar_t values to console?

阅读更多关于 How I can print the wchar_t values to console?

Example: #include <iostream> using namespace std; int main() { wchar_t en[] = L"Hello"; wchar_t ru[] = L"Привет"; //Russian language cout << ru << endl << en; return 0; } This code only prints HEX-values like adress. How to print the wchar_t string? Edit: This doesn’t work if you are trying to write text that cannot be represented in your default locale. :-( Use std::wcout instead of std::cout . wcout << ru << endl << en; Konrad Can I suggest std::wcout ? So, something like this: std::cout << "ASCII and ANSI" << std::endl; std::wcout << L"INSERT MULTIBYTE WCHAR* HERE" << std::endl; You might

Converting a \u escaped Unicode string to ASCII

阅读更多关于 Converting a \u escaped Unicode string to ASCII

问题 After reading all about iconv and Encoding , I am still confused. I am scraping the source of a web page I have a string that looks like this: \'pretty\\u003D\\u003Ebig\' (displayed in the R console as \'pretty\\\\\\u003D\\\\\\u003Ebig\' ). I want to convert this to the ASCII string, which should be \'pretty=>big\' . More simply, if I set x <- \'pretty\\\\u003D\\\\u003Ebig\' How do I perform a conversion on x to yield pretty=>big ? Any suggestions? 回答1: Use parse, but don't evaluate the

Why is the length of this string longer than the number of characters in it?

阅读更多关于 Why is the length of this string longer than the number of characters in it?

问题 This code: string a = \"abc\"; string b = \"A𠈓C\"; Console.WriteLine(\"Length a = {0}\", a.Length); Console.WriteLine(\"Length b = {0}\", b.Length); outputs: Length a = 3 Length b = 4 Why? The only thing I could imagine is that the Chinese character is 2 bytes long and that the .Length method returns the byte count. 回答1: Everyone else is giving the surface answer, but there's a deeper rationale too: the number of "characters" is a difficult-to-define question and can be surprisingly expensive

Convert between string, u16string & u32string

阅读更多关于 Convert between string, u16string & u32string

问题 I\'ve been looking for a way to convert between the Unicode string types and came across this method. Not only do I not completely understand the method (there are no comments) but also the article implies that in future there will be better methods. If this is the best method, could you please point out what makes it work, and if not I would like to hear suggestions for better methods. 回答1: mbstowcs() and wcstombs() don't necessarily convert to UTF-16 or UTF-32, they convert to wchar_t and