utf-32

Print char32_t to console

余生长醉 提交于 2019-12-08 06:20:54
问题 How can I print ( cout / wcout / ...) char32_t to console in C++11? The following code prints hex values: u32string s2 = U"Добрый день"; for(auto x:s2){ wcout<<(char32_t)x<<endl; } 回答1: First, I don't think wcout is supposed to print as characters anything but char and wchar_t . char32_t is neither. Here's a sample program that prints individual wchar_t's : #include <iostream> using namespace std; int main() { wcout << (wchar_t)0x41 << endl; return 0; } Output (ideone): A Currently, it's

No UTF-32 big-endian in C#?

放肆的年华 提交于 2019-12-07 02:05:41
问题 In C#, Encoding.UTF32 is UTF-32 little-endian, Encoding.BigEndianUnicode is UTF-16 big-endian, Encoding.Unicode is UTF-16 little-endian. But I can't find any for UTF-32 big-endian. I'm developing a simple textviewer and don't think there are many documents encoded in UTF-32 big-endian but I want to prepare for that too, just in case. Doesn't C# support UTF32 big-endian? BTW Java supports it. 回答1: It does support big endian on UTF-32. Just create the encoding yourself using the overloaded

Is it possible to convert a string containing “high” unicode chars to an array consisting of dec values derived from utf-32 (“real”) codes?

我只是一个虾纸丫 提交于 2019-12-05 15:27:59
Please, look at this script operating on a (theoretically possible) string: <!doctype html> <html> <head> <meta charset="utf-8"> <title></title> <script src="jquery.js"></script> <script> $(function () { $("#click").click(function () { var txt = $('#high-unicode').text(); var codes = ''; for (var i = 0; i < txt.length; i++) { if (i > 0) codes += ','; codes += txt.charCodeAt(i); } alert(codes); }); }); </script> </head> <body> <span id="click">click</span><br /> <span id="high-unicode">𝑥<!-- mathematical italic small x -->󳇠<!-- some char from Supplementary Private Use Area-A -->A<!-- char A --

No UTF-32 big-endian in C#?

拥有回忆 提交于 2019-12-05 06:44:12
In C#, Encoding.UTF32 is UTF-32 little-endian, Encoding.BigEndianUnicode is UTF-16 big-endian, Encoding.Unicode is UTF-16 little-endian. But I can't find any for UTF-32 big-endian. I'm developing a simple textviewer and don't think there are many documents encoded in UTF-32 big-endian but I want to prepare for that too, just in case. Doesn't C# support UTF32 big-endian? BTW Java supports it. It does support big endian on UTF-32. Just create the encoding yourself using the overloaded constructor : Encoding e = new UTF32Encoding(true /*bigEndian*/, true /*byteOrderMark*/); The encodings

How can I convert UTF-16 to UTF-32 in java?

六眼飞鱼酱① 提交于 2019-12-04 11:48:27
I have looked for solutions, but there doesn't seem to be much on this topic. I have found solutions that suggest: String unicodeString = new String("utf8 here"); byte[] bytes = String.getBytes("UTF8"); String converted = new String(bytes,"UTF16"); for converting to utf16 from utf8, however, java doesn't handle "UTF32", which makes this solution unviable. Does anyone know any other way on how to achieve this? Java does handle UTF-32, try this test byte[] a = "1".getBytes("UTF-32"); System.out.println(a.length); it will show that arrays' lentgh = 4 after searching I got this to work: public

Does std::wstring support UTF-16 and UTF-32 on Windows?

天涯浪子 提交于 2019-12-04 11:05:08
I'm learning about Unicode and have a few questions that I'm hoping to get answered. 1) I've read that on Linux, a std::wstring is 4-bytes, while on Windows, it's 2-bytes. Does this mean that Linux internal support is UTF-32 while Windows it is UTF-16 ? 2) Is the use of std::wstring very similar to the std::string interface? 3) Does VC++ offer support for using a 4-byte std::wstring? 4) Do you have to change compiler options if you use std::wstring? As a sidenote, I came across a string library for working with UTF-8 which has a very similar interface to std::string which provides familiar

How to Convert UTF-16 to UTF-32 and Print the Resulting wchar_t in C?

耗尽温柔 提交于 2019-12-04 03:44:59
问题 i'm trying to print out a string of UTF-16 characters. i posted this question a while back and the advice given was to convert to UTF-32 using iconv and print it as a string of wchar_t. i've done some research, and managed to code the following: // *c is the pointer to the characters (UTF-16) i'm trying to print // sz is the size in bytes of the input i'm trying to print iconv_t icv; char in_buf[sz]; char* in; size_t in_sz; char out_buf[sz * 2]; char* out; size_t out_sz; icv = iconv_open("UTF

How do I create a string with a surrogate pair inside of it?

♀尐吖头ヾ 提交于 2019-12-03 16:10:57
问题 I saw this post on Jon Skeet's blog where he talks about string reversing. I wanted to try the example he showed myself, but it seems to work... which leads me to believe that I have no idea how to create a string that contains a surrogate pair which will actually cause the string reversal to fail. How does one actually go about creating a string with a surrogate pair in it so that I can see the failure myself? 回答1: The term "surrogate pair" refers to a means of encoding Unicode characters

What Character Encoding is best for multinational companies

元气小坏坏 提交于 2019-12-03 15:50:49
问题 If you had a website that was to be translated into every language in the world and therefore had a database with all these translations what character encoding would be best? UTF-128? If so do all browsers understand the chosen encoding? Is character encoding straight forward to implement or are there hidden factors? Thanks in advance. 回答1: If you want to support a variety of languages for web content, you should use an encoding that covers the entire Unicode range. The best choice for this

What Character Encoding is best for multinational companies

拈花ヽ惹草 提交于 2019-12-03 06:08:26
If you had a website that was to be translated into every language in the world and therefore had a database with all these translations what character encoding would be best? UTF-128? If so do all browsers understand the chosen encoding? Is character encoding straight forward to implement or are there hidden factors? Thanks in advance. If you want to support a variety of languages for web content, you should use an encoding that covers the entire Unicode range. The best choice for this purpose is UTF-8. UTF-8 is the preferred encoding for the web; from the HTML5 draft standard : Authors are