utf-8 | 易学教程

How to change Sys.setlocale, when you get Error “request to set locale … cannot be honored”

阅读更多关于 How to change Sys.setlocale, when you get Error “request to set locale … cannot be honored”

问题 This relates to the problem I'm trying to resolve here: Printing UTF-8 (Russian) characters in R, Rmd, knitr. I was told that this problem does not exist if native locale was en_US.UTF-8 . (My current native locale is English_Canada.1252 .) But I cannot just simply change my English_Canada.1252 to en_US.UTF-8. - When I try I get this error message: > Sys.setlocale("LC_CTYPE", "en_US.UTF-8") OS reports request to set locale to "en_US.UTF-8" cannot be honored[1] "" Any idea how to resolve it?

Cakephp sending UTF-8 Emails and lineLength

阅读更多关于 Cakephp sending UTF-8 Emails and lineLength

问题 I'm trying to send an emails with UTF8 characters. Mostly the Email looks how I suspect, but randomly there will be garbage characters. I believe the garbage characters happen when a new line is inserted in the middle of one of the characters. I suspect CakePHP's email component is the culprit since I was reading that it has a feature to insert new lines according to its lineLength property. Is there any way to fix this? I'm using CakePHP 1.3. $this->Email->to = $sendEmail; $this->Email->from

User submitted CSV file upload UTF-8 concern

阅读更多关于 User submitted CSV file upload UTF-8 concern

问题 I have a feature that uploads a user submitted CSV file into my database using fgetcsv etc. My database has a collation of utf8_general_ci and the website charset is set to utf-8. How can I ensure that when inserting the data from CSV into my database for display on the website, the correct encoding is set? Do I have to test every string using something like mb_detect_encoding (seems a bit memory intensive) or can I just utf8_encode the whole string. Or should I not be worrying at all? 回答1:

PHP - detecting the user supplied character's char set

阅读更多关于 PHP - detecting the user supplied character's char set

问题 Is it possible to detect the user's string's char set? If not, how about the next question.. Are there reliable built-in PHP functions that can accurately tell if the user supplied string ( be it supplied thru get/post/cookie etc), are in a UTF-8 or not? In other words, can I do something like is_utf8($_GET['first_name']) Is there anyway this function could produce a TRUE where in reality the first_name was not in UTF-8? 回答1: Regarding 1: You can give mb_detect_encoding a try, but it's pretty

Python3: Decode UTF-8 bytes converted as string

阅读更多关于 Python3: Decode UTF-8 bytes converted as string

问题 Suppose I have something like: a = "Gżegżółka" a = bytes(a, 'utf-8') a = str(a) which returns string in form: b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka' Now it's send as simple string (I get it as assertion from eval function). How the heck can I now get normal UTF-8 form of starting word? If there is some better compression than str(bytes(x)) then I would be glad to hear. 回答1: If you want to encode and decode text, that's what the encode and decode methods are for: >>> a = "Gżegżółka" >>> b =

How do I determine a word boundary in Unicode stream in C#?

阅读更多关于 How do I determine a word boundary in Unicode stream in C#?

问题 I'm reading a Unicode stream and would rather not have to pass the entire string through a regex. Is there a simple (reliable) character I can use to break words across languages? My byte array is likely going to be based in UTF-16 or UTF-8 回答1: If you are using Java then you can use the BreakIterator. 来源： https://stackoverflow.com/questions/4900408/how-do-i-determine-a-word-boundary-in-unicode-stream-in-c

Output UTF-8 (u8) std::string

阅读更多关于 Output UTF-8 (u8) std::string

问题 In C++11 and later, using the u8 prefix on a string literal can create char (byte) sequences that are UTF-8 encoded. How do you output those sequences to a std::ostream ? How do you tell a std::ostream that a const char * or std:string to be output contains characters encoded in UTF-8, rather than the default encoding? 回答1: You don't. The stream does not know or care what the encoding of the text is. Despite it's name, a char is not treated by std:ostream as containing a character encoded in

Non-english text size too small in windows 7

阅读更多关于 Non-english text size too small in windows 7

问题 I am trying to display the current date in Nepali language. I have declared constant string for the caption. It renders the text pretty fine on windows 8.1 but the same text is displayed too small in windows 7. Adjusting the font size also doesn't help much as it should. Things I tried Installed Nepali language pack for windows 7 Installed several Unicode fonts like arial unicode ms, segoe ui, microsoft neo gothic and others The source code has been saved as UTF-8 encoding But the problem

XMLStarlet - UTF-8 Nordic characters

阅读更多关于 XMLStarlet - UTF-8 Nordic characters

问题 Using XMLStarlet (windows) to edit an RSS feed, but got a few issues with norwegian characters 'ÆØÅ'. I'm using an example I found at this site ( https://stackoverflow.com/a/14397390/3168446 ) This is my feed.xml. (Notepad++ says it's encoded in UTF-8) <?xml version="1.0" encoding="utf-8"?> <rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"> <channel> <title>My RSS Feed</title> <description>This is my RSS Feed</description> </channel> </rss> I'm not using the following example as it

Is it possible to prevent adding BOM to output UTF-8 file? (Visual Studio 2005)

阅读更多关于 Is it possible to prevent adding BOM to output UTF-8 file? (Visual Studio 2005)

问题 I need some help. I'm writing a program that opens 2 source files in UTF-8 encoding without BOM. The first contains English text and some other information, including ID. The second contains only string ID and translation. The program changes every string from the first file by replacing English chars to Russian translation from the second one and writes these strings to output file. Everything seems to be ok, but there is BOM appears in destination file. And i want to create file without BOM