utf-16

Converting from utf-16 to utf-8 in Python 3

℡╲_俬逩灬. 提交于 2019-12-19 11:24:54
问题 I'm programming in Python 3 and I'm having a small problem which I can't find any reference to it on the net. As far as I understand the default string in is utf-16, but I must work with utf-8, I can't find the command that will convert from the default one to utf-8. I'd appreciate your help very much. 回答1: In Python 3 there are two different datatypes important when you are working with string manipulation. First there is the string class, an object that represents unicode code points.

How to reverse a string that contains surrogate pairs

人走茶凉 提交于 2019-12-19 08:52:37
问题 I have written this method to reverse a string public string Reverse(string s) { if(string.IsNullOrEmpty(s)) return s; TextElementEnumerator enumerator = StringInfo.GetTextElementEnumerator(s); var elements = new List<char>(); while (enumerator.MoveNext()) { var cs = enumerator.GetTextElement().ToCharArray(); if (cs.Length > 1) { elements.AddRange(cs.Reverse()); } else { elements.AddRange(cs); } } elements.Reverse(); return string.Concat(elements); } Now, I don't want to start a discussion

How to get a reliable unicode character count in Python?

孤人 提交于 2019-12-19 01:26:11
问题 Google App Engine uses Python 2.5.2, apparently with UCS4 enabled. But the GAE datastore uses UTF-8 internally. So if you store u'\ud834\udd0c' (length 2) to the datastore, when you retrieve it, you get '\U0001d10c' (length 1). I'm trying to count of the number of unicode characters in the string in a way that gives the same result before and after storing it. So I'm trying to normalize the string (from u'\ud834\udd0c' to '\U0001d10c') as soon as I receive it, before calculating its length

Valid Locale Names

谁都会走 提交于 2019-12-18 11:48:53
问题 How do you find valid locale names? I am currently using MAC OS X. But information about other platforms would also be useful. #include <fstream> #include <iostream> int main(int argc,char* argv[]) { try { std::wifstream data; data.imbue(std::locale("en_US.UTF-16")); data.open("Plop"); } catch(std::exception const& e) { std::cout << "Exception: " << e.what() << "\n"; throw; } } % g++ main.cpp % ./a.out Exception: locale::facet::_S_create_c_locale name not valid Abort 回答1: This page says: The

UCS-2 and SQL Server

懵懂的女人 提交于 2019-12-18 08:27:47
问题 While researching options for storing mostly-English-but-sometimes-not data in a SQL Server database that can potentially be quite large, I'm leaning toward storing most string data as UTF-8 encoded. However, Microsoft chose UCS-2 for reasons that I don't fully understand which is causing me to second-guess that leaning. The documentation for SQL Server 2012 does show how to create a UTF-8 UDT, but the decision for UCS-2 presumably pervades SQL Server. Wikipedia (which interestingly notes

UCS-2 and SQL Server

半腔热情 提交于 2019-12-18 08:27:10
问题 While researching options for storing mostly-English-but-sometimes-not data in a SQL Server database that can potentially be quite large, I'm leaning toward storing most string data as UTF-8 encoded. However, Microsoft chose UCS-2 for reasons that I don't fully understand which is causing me to second-guess that leaning. The documentation for SQL Server 2012 does show how to create a UTF-8 UDT, but the decision for UCS-2 presumably pervades SQL Server. Wikipedia (which interestingly notes

How to convert UTF-8 encoded std::string to UTF-16 std::string

断了今生、忘了曾经 提交于 2019-12-18 07:12:36
问题 How can i convert UTF-8 encoded std::string to UTF-16 std::string? Is it possible? And no, i can't use std::wstring in my case. Windows, MSVC-11.0. 回答1: How about trying like this:- std::string s = u8"Your string"; // #include <codecvt> std::wstring_convert<std::codecvt<char16_t,char,std::mbstate_t>,char16_t> convert; std::u16string u16 = convert.from_bytes(s); std::string u8 = convert.to_bytes(u16); Also check this for UTF to UTF conversion. From the docs:- The specialization codecvt

How to use Boost Spirit to parse Chinese(unicode utf-16)?

让人想犯罪 __ 提交于 2019-12-18 06:59:08
问题 My program does not recognize Chinese. How to use spirit to recognize Chinese? I use wstring and has convert it to utf-16. Here is my header file: #pragma once #define BOOST_SPIRIT_UNICODE #include <boost/spirit/include/qi.hpp> #include <string> #include <vector> #include <map> using namespace std; namespace qi = boost::spirit::qi; namespace ascii = boost::spirit::ascii; typedef pair<wstring,wstring> WordMeaningType; typedef vector<WordMeaningType> WordMeaningsType; typedef pair<wstring

UTF-16 to UTF-8 conversion in JavaScript

╄→尐↘猪︶ㄣ 提交于 2019-12-18 06:03:32
问题 I have Base64 encoded data that is in UTF-16 I am trying to decode the data but most libraries only support UTF-8. I believe I have to drop the null bites but I am unsure how. Currently I am using David Chambbers Polyfill for Base64, but I have also tried other libraries such as phpjs.org, none of which support UTF-16. One thing to point out is on Chrome the atob method works with out problem, Firefox I get results described here, and in IE I am only returned the first character. Any help is

UTF-16 to UTF-8 conversion in JavaScript

岁酱吖の 提交于 2019-12-18 06:02:01
问题 I have Base64 encoded data that is in UTF-16 I am trying to decode the data but most libraries only support UTF-8. I believe I have to drop the null bites but I am unsure how. Currently I am using David Chambbers Polyfill for Base64, but I have also tried other libraries such as phpjs.org, none of which support UTF-16. One thing to point out is on Chrome the atob method works with out problem, Firefox I get results described here, and in IE I am only returned the first character. Any help is