unicode

Is there any way to avoid showing “xn--” for IDN domains?

馋奶兔 提交于 2021-02-20 19:00:00
问题 If I use a domain such as www.äöü.com , is there any way to avoid it being displayed as www.xn--4ca0bs.com in users’ browsers? Domains such as www.xn--4ca0bs.com cause a lot of confusion with average internet users, I guess. 回答1: This is entirely up to the browser. In fact, IDNs are pretty much a browser-only technology. Domain names cannot contain non-ASCII characters, so the actual domain name is always the Punycode encoded xn--... form. It's up to the browser to prettify this, but many

Is there any way to avoid showing “xn--” for IDN domains?

放肆的年华 提交于 2021-02-20 18:59:12
问题 If I use a domain such as www.äöü.com , is there any way to avoid it being displayed as www.xn--4ca0bs.com in users’ browsers? Domains such as www.xn--4ca0bs.com cause a lot of confusion with average internet users, I guess. 回答1: This is entirely up to the browser. In fact, IDNs are pretty much a browser-only technology. Domain names cannot contain non-ASCII characters, so the actual domain name is always the Punycode encoded xn--... form. It's up to the browser to prettify this, but many

Python unicode string literals :: what's the difference between '\u0391' and u'\u0391'

耗尽温柔 提交于 2021-02-20 09:40:33
问题 I am using Python 2.7.3. Can anybody explain the difference between the literals: '\u0391' and: u'\u0391' and the different way they are echoed in the REPL below (especially the extra slash added to a1): >>> a1='\u0391' >>> a1 '\\u0391' >>> type(a1) <type 'str'> >>> >>> a2=u'\u0391' >>> a2 u'\u0391' >>> type(a2) <type 'unicode'> >>> 回答1: You can only use unicode escapes ( \uabcd ) in a unicode string literal. They have no meaning in a byte string. A Python 2 Unicode literal ( u'some text' )

Python unicode string literals :: what's the difference between '\u0391' and u'\u0391'

与世无争的帅哥 提交于 2021-02-20 09:37:53
问题 I am using Python 2.7.3. Can anybody explain the difference between the literals: '\u0391' and: u'\u0391' and the different way they are echoed in the REPL below (especially the extra slash added to a1): >>> a1='\u0391' >>> a1 '\\u0391' >>> type(a1) <type 'str'> >>> >>> a2=u'\u0391' >>> a2 u'\u0391' >>> type(a2) <type 'unicode'> >>> 回答1: You can only use unicode escapes ( \uabcd ) in a unicode string literal. They have no meaning in a byte string. A Python 2 Unicode literal ( u'some text' )

Python unicode string literals :: what's the difference between '\u0391' and u'\u0391'

旧城冷巷雨未停 提交于 2021-02-20 09:37:25
问题 I am using Python 2.7.3. Can anybody explain the difference between the literals: '\u0391' and: u'\u0391' and the different way they are echoed in the REPL below (especially the extra slash added to a1): >>> a1='\u0391' >>> a1 '\\u0391' >>> type(a1) <type 'str'> >>> >>> a2=u'\u0391' >>> a2 u'\u0391' >>> type(a2) <type 'unicode'> >>> 回答1: You can only use unicode escapes ( \uabcd ) in a unicode string literal. They have no meaning in a byte string. A Python 2 Unicode literal ( u'some text' )

issue with encoding when importing json into Postgres

廉价感情. 提交于 2021-02-20 04:09:50
问题 I'm using pandas, and exporting data as json like this: import pandas as pd df = pd.DataFrame({'a': ['Têst']}) df.to_json(orient='records', lines=True) > u'{"a":"T\\u00east"}' This makes sense since we have a Unicode character 00ea prefixed with \u and it is escaped with \ when converted to JSON But then I import the JSON strings into Postgres with COPY buffer = cStringIO.StringIO() buffer.write(df.to_json(orient='records', lines=True)) buffer.seek(0) with connection.cursor() as cursor:

Official repository of Unicode character names

吃可爱长大的小学妹 提交于 2021-02-19 08:46:08
问题 There are a few ways to get the list of all Unicode characters' names: for example using Python module unicodedata, as explained in List of unicode character names, or using the website: https://unicode.org/charts/charindex.html but here it's incomplete, and you have to open and parse PDF to find the names. But what is the official source / repository of all Unicode character names? (such that if a new character is added, the list is updated, so I'm looking for the initial source for these

What is a realistic maximum number of unicode combining characters?

江枫思渺然 提交于 2021-02-19 07:44:22
问题 I'm looking for a maximum number of unicode combining characters that appear after a non-combining one in a realistic natural text . I know that in unicode text there can be an arbitrary number of combinings placed anywhere in the text. However, I am writing a specialized application that has to operate under constrained resources and because of that and other technical reasons displaying an arbitrary number of combining chars after a non-combining one is not an option. However I would still

Python2: Using .decode with errors='replace' still returns errors

让人想犯罪 __ 提交于 2021-02-19 06:13:31
问题 So I have a message which is read from a file of unknown encoding. I want to send to a webpage for display. I've grappled a lot with UnicodeErrors and have gone through many Q&As on StackOverflow and think I have decent understand of how Unicode and encoding works. My current code looks like this try : return message.decode(encoding='utf-8') except: try: return message.decode(encoding='latin-1') except: try: print("Unable to entirely decode in latin or utf-8, will replace error characters

How to read a UTF-16 text file in C++17

不打扰是莪最后的温柔 提交于 2021-02-19 05:57:06
问题 I am very new to C++. I want to read a UTF-16 text file in C++17 in Visual Studio 2019. I have tried several methods in the internet (including StackOverflow) but none of them worked, and some of them didn't compile (I think they only support older compilers). I am trying to achieve this without using any 3rd party libraries. This reads a text file, but it has some weird characters and spaces between each letter. // open file for reading std::wifstream istrm(filename, std::ios::binary); if (