utf-16

adding backslash to fix character encoding in ruby string

蹲街弑〆低调 提交于 2019-12-06 13:20:39
I'm sure this is very easy but I'm getting tied in a knot with all these backslashes. I have some data that I'm scraping (politely) from a website. Occasionally a sentence comes to me looking something like this: u00a362 000? you must be joking Which should of course be '£2 000? you must be joking'. A short test in irb deciphered it. ruby-1.9.2-p180 :001 > string = "u00a3" => "u00a3" ruby-1.9.2-p180 :002 > string = "\u00a3" => "£" Of course: add a backslash and it will be decoded. I created the following with the help of this question : puts str.gsub('u00', '\\u00') which resulted in \u00a3

MD5 of an UTF16LE (without BOM and 0-Byte End) in C#

寵の児 提交于 2019-12-06 12:13:26
问题 I've got the following problem; I need to create a method, which generates a MD5 Hash of a string. This string is for example "1234567z-äbc" (Yes with the umlaut). The actual MD5 Hash of this one is: 935fe44e659beb5a3bb7a4564fba0513 The MD5 Hash, which I need is (100% sure): 9e224a41eeefa284df7bb0f26c2913e2 My documentation says, it has to be a UTF16LE conversion without BOM and 0-Byte End of the string. The problem is the conversion to this. I have got a working example in Javascript, but

URL encode ASCII/UTF16 characters

假装没事ソ 提交于 2019-12-06 12:03:30
I'm trying to URL-encode some strings, however I have problems with methods provided by the .Net framework. For instance, I'm trying the encode strings that contain the 'â' character. According to w3schools for instance, I would expect this caracter to be encoded as '%E2' (and a PHP system I must call expects this too...). I tried using these methods: System.Web.HttpUtility.UrlEncode("â"); System.Web.HttpUtility.UrlPathEncode("â"); Uri.EscapeUriString("â"); Uri.EscapeDataString("â"); However, they all encode this character as: %C3%A2 I suppose this has something to do with the fact that

Bug with Python UTF-16 output and Windows line endings?

筅森魡賤 提交于 2019-12-06 08:11:42
With this code: test.py import sys import codecs sys.stdout = codecs.getwriter('utf-16')(sys.stdout) print "test1" print "test2" Then I run it as: test.py > test.txt In Python 2.6 on Windows 2000, I'm finding that the newline characters are being output as the byte sequence \x0D\x0A\x00 which of course is wrong for UTF-16. Am I missing something, or is this a bug? Try this: import sys import codecs if sys.platform == "win32": import os, msvcrt msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY) class CRLFWrapper(object): def __init__(self, output): self.output = output def write(self, s): self

Struggling to convert vector<char> to wstring

别等时光非礼了梦想. 提交于 2019-12-06 07:44:32
I need to convert utf16 text to utf8. The actual conversion code is simple: std::wstring in(...); std::string out = boost::locale::conv::utf_to_utf<char, wchar_t>(in); However the issue is that the UTF16 is read from a file and it may or may not contain BOM. My code needs to be portable (minimum is windows/osx/linux). I'm really struggling to figure out how to create a wstring from the byte sequence. EDIT: this is not a duplicate of the linked question, as in that question the OP needs to convert a wide string into an array of bytes - and I need to convert the other way around. You should not

C++ unicode UTF-16 encoding

[亡魂溺海] 提交于 2019-12-06 06:36:18
问题 I have a wide char string is L"hao123--我的上网主页", and it must be encoded to "hao123--\u6211\u7684\u4E0A\u7F51\u4E3B\u9875". I was told that the encoded string is a special “%uNNNN” format for encoding Unicode UTF-16 code points. In this website, it tells me it's JavaScript escapes. But I don't know how to encode it with C++. It there any library to get this to work? or give me some tips. Thanks my friends! 回答1: Embedding unicode in string literals is generally not a good idea and is not

how to convert utf8 to std::string?

廉价感情. 提交于 2019-12-06 06:28:49
I am working on this code which receives a cpprest sdk response containing a base64_encoded payload which is a json. here is my code snippet: typedef std::wstring string_t; //defined in basic_types.h in cpprest lib void demo() { http_response response; //code to handle respose ... json::value output= response.extract_json(); string_t payload = output.at(L"payload").as_string(); vector<unsigned char> base64_encoded_payload = conversions::from_base64(payload); std::string utf8_payload(base64_encoded_payload.begin(), base64_encoded_payload.end()); //in debugger I see the Japanese chars are

How to convert from utf-16 to utf-32 on Linux with std library?

六眼飞鱼酱① 提交于 2019-12-06 06:03:50
On MSVC converting utf-16 to utf-32 is easy - with C11's codecvt_utf16 locale facet. But in GCC (gcc (Debian 4.7.2-5) 4.7.2) seemingly this new feature hasn't been implemented yet. Is there a way to perform such conversion on Linux without iconv (preferrably using conversion tools of std library)? Decoding UTF-16 into UTF-32 is extremely easy. You may want to detect at compile time the libc version you're using, and deploy your conversion routine if you detect a broken libc (without the functions you need). Inputs: a pointer to the source UTF-16 data ( char16_t * , ushort * , -- for

how can I use linux command sed to process Little-endian UTF-16 file

☆樱花仙子☆ 提交于 2019-12-06 04:54:28
问题 I am working on an application about windows rdp. Now I get a problem when I try to use the sed command to replace the string of IP address directly in the rdp file. But after executing this command, the origin rdp file is garbled. sed -i "s/address:s:.*/address:s:$(cat check-free-ip.to.rdpzhitong.rdp)/" rdpzhitong.rdp I find that the file's format is Little-endian UTF-16 Unicode. Can I still use the sed command to replace the text in the files correctly? Or other method to process this

UTF16 hex to text

风格不统一 提交于 2019-12-06 04:36:45
I have UTF-16 hex representation such as “0633064406270645” which is "سلام" in Arabic language. I would like to convert it to its text equivalent. Is there a straight way to do that in PostgreSQL? I can convert the UTF code point like below; unfortunately it seems UTF16 is not supported. Any ideas on how to do it in PostgreSQL, worst case I will write a function? SELECT convert_from (decode (E'D8B3D984D8A7D985', 'hex'),'UTF8'); "سلام" SELECT convert_from (decode (E'0633064406270645', 'hex'),'UTF16'); ERROR: invalid source encoding name "UTF16" ********** Error ********** That's right, Postgres