utf-8

How to change Sys.setlocale, when you get Error “request to set locale … cannot be honored”

你说的曾经没有我的故事 提交于 2020-01-06 07:54:36
问题 This relates to the problem I'm trying to resolve here: Printing UTF-8 (Russian) characters in R, Rmd, knitr. I was told that this problem does not exist if native locale was en_US.UTF-8 . (My current native locale is English_Canada.1252 .) But I cannot just simply change my English_Canada.1252 to en_US.UTF-8. - When I try I get this error message: > Sys.setlocale("LC_CTYPE", "en_US.UTF-8") OS reports request to set locale to "en_US.UTF-8" cannot be honored[1] "" Any idea how to resolve it?

Cakephp sending UTF-8 Emails and lineLength

梦想的初衷 提交于 2020-01-06 07:26:14
问题 I'm trying to send an emails with UTF8 characters. Mostly the Email looks how I suspect, but randomly there will be garbage characters. I believe the garbage characters happen when a new line is inserted in the middle of one of the characters. I suspect CakePHP's email component is the culprit since I was reading that it has a feature to insert new lines according to its lineLength property. Is there any way to fix this? I'm using CakePHP 1.3. $this->Email->to = $sendEmail; $this->Email->from

User submitted CSV file upload UTF-8 concern

为君一笑 提交于 2020-01-06 07:21:17
问题 I have a feature that uploads a user submitted CSV file into my database using fgetcsv etc. My database has a collation of utf8_general_ci and the website charset is set to utf-8. How can I ensure that when inserting the data from CSV into my database for display on the website, the correct encoding is set? Do I have to test every string using something like mb_detect_encoding (seems a bit memory intensive) or can I just utf8_encode the whole string. Or should I not be worrying at all? 回答1:

PHP - detecting the user supplied character's char set

时光毁灭记忆、已成空白 提交于 2020-01-06 05:39:08
问题 Is it possible to detect the user's string's char set? If not, how about the next question.. Are there reliable built-in PHP functions that can accurately tell if the user supplied string ( be it supplied thru get/post/cookie etc), are in a UTF-8 or not? In other words, can I do something like is_utf8($_GET['first_name']) Is there anyway this function could produce a TRUE where in reality the first_name was not in UTF-8? 回答1: Regarding 1: You can give mb_detect_encoding a try, but it's pretty

Python3: Decode UTF-8 bytes converted as string

☆樱花仙子☆ 提交于 2020-01-06 05:06:22
问题 Suppose I have something like: a = "Gżegżółka" a = bytes(a, 'utf-8') a = str(a) which returns string in form: b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka' Now it's send as simple string (I get it as assertion from eval function). How the heck can I now get normal UTF-8 form of starting word? If there is some better compression than str(bytes(x)) then I would be glad to hear. 回答1: If you want to encode and decode text, that's what the encode and decode methods are for: >>> a = "Gżegżółka" >>> b =

How do I determine a word boundary in Unicode stream in C#?

偶尔善良 提交于 2020-01-06 04:46:06
问题 I'm reading a Unicode stream and would rather not have to pass the entire string through a regex. Is there a simple (reliable) character I can use to break words across languages? My byte array is likely going to be based in UTF-16 or UTF-8 回答1: If you are using Java then you can use the BreakIterator. 来源: https://stackoverflow.com/questions/4900408/how-do-i-determine-a-word-boundary-in-unicode-stream-in-c

Output UTF-8 (u8) std::string

老子叫甜甜 提交于 2020-01-06 04:29:05
问题 In C++11 and later, using the u8 prefix on a string literal can create char (byte) sequences that are UTF-8 encoded. How do you output those sequences to a std::ostream ? How do you tell a std::ostream that a const char * or std:string to be output contains characters encoded in UTF-8, rather than the default encoding? 回答1: You don't. The stream does not know or care what the encoding of the text is. Despite it's name, a char is not treated by std:ostream as containing a character encoded in

Non-english text size too small in windows 7

て烟熏妆下的殇ゞ 提交于 2020-01-06 04:01:09
问题 I am trying to display the current date in Nepali language. I have declared constant string for the caption. It renders the text pretty fine on windows 8.1 but the same text is displayed too small in windows 7. Adjusting the font size also doesn't help much as it should. Things I tried Installed Nepali language pack for windows 7 Installed several Unicode fonts like arial unicode ms, segoe ui, microsoft neo gothic and others The source code has been saved as UTF-8 encoding But the problem

XMLStarlet - UTF-8 Nordic characters

会有一股神秘感。 提交于 2020-01-06 03:26:07
问题 Using XMLStarlet (windows) to edit an RSS feed, but got a few issues with norwegian characters 'ÆØÅ'. I'm using an example I found at this site ( https://stackoverflow.com/a/14397390/3168446 ) This is my feed.xml. (Notepad++ says it's encoded in UTF-8) <?xml version="1.0" encoding="utf-8"?> <rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"> <channel> <title>My RSS Feed</title> <description>This is my RSS Feed</description> </channel> </rss> I'm not using the following example as it

Is it possible to prevent adding BOM to output UTF-8 file? (Visual Studio 2005)

半城伤御伤魂 提交于 2020-01-06 03:16:06
问题 I need some help. I'm writing a program that opens 2 source files in UTF-8 encoding without BOM. The first contains English text and some other information, including ID. The second contains only string ID and translation. The program changes every string from the first file by replacing English chars to Russian translation from the second one and writes these strings to output file. Everything seems to be ok, but there is BOM appears in destination file. And i want to create file without BOM