utf-16 | 易学教程

Converting from utf-16 to utf-8 in Python 3

阅读更多关于 Converting from utf-16 to utf-8 in Python 3

问题 I'm programming in Python 3 and I'm having a small problem which I can't find any reference to it on the net. As far as I understand the default string in is utf-16, but I must work with utf-8, I can't find the command that will convert from the default one to utf-8. I'd appreciate your help very much. 回答1: In Python 3 there are two different datatypes important when you are working with string manipulation. First there is the string class, an object that represents unicode code points.

How to reverse a string that contains surrogate pairs

阅读更多关于 How to reverse a string that contains surrogate pairs

问题 I have written this method to reverse a string public string Reverse(string s) { if(string.IsNullOrEmpty(s)) return s; TextElementEnumerator enumerator = StringInfo.GetTextElementEnumerator(s); var elements = new List<char>(); while (enumerator.MoveNext()) { var cs = enumerator.GetTextElement().ToCharArray(); if (cs.Length > 1) { elements.AddRange(cs.Reverse()); } else { elements.AddRange(cs); } } elements.Reverse(); return string.Concat(elements); } Now, I don't want to start a discussion

How to get a reliable unicode character count in Python?

阅读更多关于 How to get a reliable unicode character count in Python?

问题 Google App Engine uses Python 2.5.2, apparently with UCS4 enabled. But the GAE datastore uses UTF-8 internally. So if you store u'\ud834\udd0c' (length 2) to the datastore, when you retrieve it, you get '\U0001d10c' (length 1). I'm trying to count of the number of unicode characters in the string in a way that gives the same result before and after storing it. So I'm trying to normalize the string (from u'\ud834\udd0c' to '\U0001d10c') as soon as I receive it, before calculating its length

Valid Locale Names

阅读更多关于 Valid Locale Names

问题 How do you find valid locale names? I am currently using MAC OS X. But information about other platforms would also be useful. #include <fstream> #include <iostream> int main(int argc,char* argv[]) { try { std::wifstream data; data.imbue(std::locale("en_US.UTF-16")); data.open("Plop"); } catch(std::exception const& e) { std::cout << "Exception: " << e.what() << "\n"; throw; } } % g++ main.cpp % ./a.out Exception: locale::facet::_S_create_c_locale name not valid Abort 回答1: This page says: The

UCS-2 and SQL Server

阅读更多关于 UCS-2 and SQL Server

问题 While researching options for storing mostly-English-but-sometimes-not data in a SQL Server database that can potentially be quite large, I'm leaning toward storing most string data as UTF-8 encoded. However, Microsoft chose UCS-2 for reasons that I don't fully understand which is causing me to second-guess that leaning. The documentation for SQL Server 2012 does show how to create a UTF-8 UDT, but the decision for UCS-2 presumably pervades SQL Server. Wikipedia (which interestingly notes

UCS-2 and SQL Server

阅读更多关于 UCS-2 and SQL Server

How to convert UTF-8 encoded std::string to UTF-16 std::string

阅读更多关于 How to convert UTF-8 encoded std::string to UTF-16 std::string

问题 How can i convert UTF-8 encoded std::string to UTF-16 std::string? Is it possible? And no, i can't use std::wstring in my case. Windows, MSVC-11.0. 回答1: How about trying like this:- std::string s = u8"Your string"; // #include <codecvt> std::wstring_convert<std::codecvt<char16_t,char,std::mbstate_t>,char16_t> convert; std::u16string u16 = convert.from_bytes(s); std::string u8 = convert.to_bytes(u16); Also check this for UTF to UTF conversion. From the docs:- The specialization codecvt

How to use Boost Spirit to parse Chinese(unicode utf-16)?

阅读更多关于 How to use Boost Spirit to parse Chinese(unicode utf-16)?

问题 My program does not recognize Chinese. How to use spirit to recognize Chinese? I use wstring and has convert it to utf-16. Here is my header file: #pragma once #define BOOST_SPIRIT_UNICODE #include <boost/spirit/include/qi.hpp> #include <string> #include <vector> #include <map> using namespace std; namespace qi = boost::spirit::qi; namespace ascii = boost::spirit::ascii; typedef pair<wstring,wstring> WordMeaningType; typedef vector<WordMeaningType> WordMeaningsType; typedef pair<wstring

UTF-16 to UTF-8 conversion in JavaScript

阅读更多关于 UTF-16 to UTF-8 conversion in JavaScript

问题 I have Base64 encoded data that is in UTF-16 I am trying to decode the data but most libraries only support UTF-8. I believe I have to drop the null bites but I am unsure how. Currently I am using David Chambbers Polyfill for Base64, but I have also tried other libraries such as phpjs.org, none of which support UTF-16. One thing to point out is on Chrome the atob method works with out problem, Firefox I get results described here, and in IE I am only returned the first character. Any help is

UTF-16 to UTF-8 conversion in JavaScript

阅读更多关于 UTF-16 to UTF-8 conversion in JavaScript