utf-16 | 易学教程

adding backslash to fix character encoding in ruby string

阅读更多关于 adding backslash to fix character encoding in ruby string

I'm sure this is very easy but I'm getting tied in a knot with all these backslashes. I have some data that I'm scraping (politely) from a website. Occasionally a sentence comes to me looking something like this: u00a362 000? you must be joking Which should of course be '£2 000? you must be joking'. A short test in irb deciphered it. ruby-1.9.2-p180 :001 > string = "u00a3" => "u00a3" ruby-1.9.2-p180 :002 > string = "\u00a3" => "£" Of course: add a backslash and it will be decoded. I created the following with the help of this question : puts str.gsub('u00', '\\u00') which resulted in \u00a3

MD5 of an UTF16LE (without BOM and 0-Byte End) in C#

阅读更多关于 MD5 of an UTF16LE (without BOM and 0-Byte End) in C#

问题 I've got the following problem; I need to create a method, which generates a MD5 Hash of a string. This string is for example "1234567z-äbc" (Yes with the umlaut). The actual MD5 Hash of this one is: 935fe44e659beb5a3bb7a4564fba0513 The MD5 Hash, which I need is (100% sure): 9e224a41eeefa284df7bb0f26c2913e2 My documentation says, it has to be a UTF16LE conversion without BOM and 0-Byte End of the string. The problem is the conversion to this. I have got a working example in Javascript, but

URL encode ASCII/UTF16 characters

阅读更多关于 URL encode ASCII/UTF16 characters

I'm trying to URL-encode some strings, however I have problems with methods provided by the .Net framework. For instance, I'm trying the encode strings that contain the 'â' character. According to w3schools for instance, I would expect this caracter to be encoded as '%E2' (and a PHP system I must call expects this too...). I tried using these methods: System.Web.HttpUtility.UrlEncode("â"); System.Web.HttpUtility.UrlPathEncode("â"); Uri.EscapeUriString("â"); Uri.EscapeDataString("â"); However, they all encode this character as: %C3%A2 I suppose this has something to do with the fact that

Bug with Python UTF-16 output and Windows line endings?

阅读更多关于 Bug with Python UTF-16 output and Windows line endings?

With this code: test.py import sys import codecs sys.stdout = codecs.getwriter('utf-16')(sys.stdout) print "test1" print "test2" Then I run it as: test.py > test.txt In Python 2.6 on Windows 2000, I'm finding that the newline characters are being output as the byte sequence \x0D\x0A\x00 which of course is wrong for UTF-16. Am I missing something, or is this a bug? Try this: import sys import codecs if sys.platform == "win32": import os, msvcrt msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY) class CRLFWrapper(object): def __init__(self, output): self.output = output def write(self, s): self

Struggling to convert vector<char> to wstring

阅读更多关于 Struggling to convert vector to wstring

I need to convert utf16 text to utf8. The actual conversion code is simple: std::wstring in(...); std::string out = boost::locale::conv::utf_to_utf<char, wchar_t>(in); However the issue is that the UTF16 is read from a file and it may or may not contain BOM. My code needs to be portable (minimum is windows/osx/linux). I'm really struggling to figure out how to create a wstring from the byte sequence. EDIT: this is not a duplicate of the linked question, as in that question the OP needs to convert a wide string into an array of bytes - and I need to convert the other way around. You should not

C++ unicode UTF-16 encoding

阅读更多关于 C++ unicode UTF-16 encoding

问题 I have a wide char string is L"hao123--我的上网主页", and it must be encoded to "hao123--\u6211\u7684\u4E0A\u7F51\u4E3B\u9875". I was told that the encoded string is a special “%uNNNN” format for encoding Unicode UTF-16 code points. In this website, it tells me it's JavaScript escapes. But I don't know how to encode it with C++. It there any library to get this to work? or give me some tips. Thanks my friends! 回答1: Embedding unicode in string literals is generally not a good idea and is not

how to convert utf8 to std::string?

阅读更多关于 how to convert utf8 to std::string?

I am working on this code which receives a cpprest sdk response containing a base64_encoded payload which is a json. here is my code snippet: typedef std::wstring string_t; //defined in basic_types.h in cpprest lib void demo() { http_response response; //code to handle respose ... json::value output= response.extract_json(); string_t payload = output.at(L"payload").as_string(); vector<unsigned char> base64_encoded_payload = conversions::from_base64(payload); std::string utf8_payload(base64_encoded_payload.begin(), base64_encoded_payload.end()); //in debugger I see the Japanese chars are

How to convert from utf-16 to utf-32 on Linux with std library?

阅读更多关于 How to convert from utf-16 to utf-32 on Linux with std library?

On MSVC converting utf-16 to utf-32 is easy - with C11's codecvt_utf16 locale facet. But in GCC (gcc (Debian 4.7.2-5) 4.7.2) seemingly this new feature hasn't been implemented yet. Is there a way to perform such conversion on Linux without iconv (preferrably using conversion tools of std library)? Decoding UTF-16 into UTF-32 is extremely easy. You may want to detect at compile time the libc version you're using, and deploy your conversion routine if you detect a broken libc (without the functions you need). Inputs: a pointer to the source UTF-16 data ( char16_t * , ushort * , -- for

how can I use linux command sed to process Little-endian UTF-16 file

阅读更多关于 how can I use linux command sed to process Little-endian UTF-16 file

问题 I am working on an application about windows rdp. Now I get a problem when I try to use the sed command to replace the string of IP address directly in the rdp file. But after executing this command, the origin rdp file is garbled. sed -i "s/address:s:.*/address:s:$(cat check-free-ip.to.rdpzhitong.rdp)/" rdpzhitong.rdp I find that the file's format is Little-endian UTF-16 Unicode. Can I still use the sed command to replace the text in the files correctly? Or other method to process this

UTF16 hex to text

阅读更多关于 UTF16 hex to text

I have UTF-16 hex representation such as “0633064406270645” which is "سلام" in Arabic language. I would like to convert it to its text equivalent. Is there a straight way to do that in PostgreSQL? I can convert the UTF code point like below; unfortunately it seems UTF16 is not supported. Any ideas on how to do it in PostgreSQL, worst case I will write a function? SELECT convert_from (decode (E'D8B3D984D8A7D985', 'hex'),'UTF8'); "سلام" SELECT convert_from (decode (E'0633064406270645', 'hex'),'UTF16'); ERROR: invalid source encoding name "UTF16" ********** Error ********** That's right, Postgres