unicode-normalization

VBA String Normalization (via WinAPI)

北城以北 提交于 2021-02-08 08:20:17
问题 I'm new to attempting to write code in VBA to use WinAPI functions. What encoding does the WinAPI Normalize() function work with? UTF-16 is what I would expect, but the following does not work. The number of characters seems like it's not calculated right, and then the attempt to actually create a normalized string will just crash Access. 'normFormEnum 'not random numbers, but from ... 'https://msdn.microsoft.com/en-us/library/windows/desktop/dd319094(v=vs.85).aspx 'for use in calling the Win

VBA String Normalization (via WinAPI)

不问归期 提交于 2021-02-08 08:19:19
问题 I'm new to attempting to write code in VBA to use WinAPI functions. What encoding does the WinAPI Normalize() function work with? UTF-16 is what I would expect, but the following does not work. The number of characters seems like it's not calculated right, and then the attempt to actually create a normalized string will just crash Access. 'normFormEnum 'not random numbers, but from ... 'https://msdn.microsoft.com/en-us/library/windows/desktop/dd319094(v=vs.85).aspx 'for use in calling the Win

Why do LATIN SMALL LETTER DOTLESS I, COMBINING DOT ABOVE not get normalized to “i” in NFC form?

僤鯓⒐⒋嵵緔 提交于 2021-01-28 04:42:40
问题 Example in Python: >>> s = 'ı̇' >>> len(s) 2 >>> list(s) ['ı', '̇'] >>> print(", ".join(map(unicodedata.name, s))) LATIN SMALL LETTER DOTLESS I, COMBINING DOT ABOVE >>> normalized = unicodedata.normalize('NFC', s) >>> print(", ".join(map(unicodedata.name, normalized))) LATIN SMALL LETTER DOTLESS I, COMBINING DOT ABOVE As you can see, NFC normalization does not compose the dotless i + a dot to a normal i . Is there a rationale for this? Is this an oversight? Or is it not included because NFC

get all unicode variations of a latin character

天大地大妈咪最大 提交于 2021-01-27 09:11:13
问题 E.g., for the character "a" , I want to get a string (list of chars) like "aàáâãäåāăą" (not sure if that example list is complete...) (basically all unicode chars with names "Latin Small Letter A with *" ). Is there a generic way to get this? I'm asking for Python, but if the answer is more generic, this is also fine, although I would appreciate a Python code snippet in any case. Python >=3.5 is fine. But I guess you need to have access to the Unicode database, e.g. the Python module

get all unicode variations of a latin character

試著忘記壹切 提交于 2021-01-27 09:10:10
问题 E.g., for the character "a" , I want to get a string (list of chars) like "aàáâãäåāăą" (not sure if that example list is complete...) (basically all unicode chars with names "Latin Small Letter A with *" ). Is there a generic way to get this? I'm asking for Python, but if the answer is more generic, this is also fine, although I would appreciate a Python code snippet in any case. Python >=3.5 is fine. But I guess you need to have access to the Unicode database, e.g. the Python module

How does Unicode conversion to ASCII know to map Ł to L

自古美人都是妖i 提交于 2020-05-16 04:35:28
问题 I was surprised to find that no Unicode normalization of the Ł character maps it to something like L + combining stroke . That was my best explanation to understand why Ł to get mapped to L rather than ? when converting from a Unicode-capable encoding to ASCII or a code page that doesn't have the Ł character. How does it work otherwise? Does the standard define fallback characters? 来源: https://stackoverflow.com/questions/58674948/how-does-unicode-conversion-to-ascii-know-to-map-%c5%81-to-l

Python Inconsistent Special Character Storage In String

谁都会走 提交于 2020-03-24 02:44:56
问题 Version is Python 3.7. I've just found out python sometimes will store the character ñ in a string with multiple representations and I'm completely at a loss as to why or how to deal with it. I'm not sure the best way to show this issue, so I'm just going to show some code output. I have two strings, s1 and s2 both set to equal 'Dan Peña' They are both of type string. I can run the code: print(s1 == s2) # prints false print(len(s1)) # prints 8 print(len(s2)) # prints 9 print(type(s1)) #

SASL password normalization

不打扰是莪最后的温柔 提交于 2020-01-17 06:38:28
问题 there is very simple question - can you normalize for me some password, because I can't understand how it works? So, there is password: "IDoMdGuFE9S0", how it looks in "normalized" view? There are only alphanumeric ascii characters. Does result and original be equal? PS: Sorry for my bad English. 回答1: I'm assuming with "normalized" you mean SASLpreped. In the case of "IDoMdGuFE9S0", the output is the same as the input (it is fully ASCII, with no control sequences or U+00AD ). If you're

SASL password normalization

一个人想着一个人 提交于 2020-01-17 06:38:21
问题 there is very simple question - can you normalize for me some password, because I can't understand how it works? So, there is password: "IDoMdGuFE9S0", how it looks in "normalized" view? There are only alphanumeric ascii characters. Does result and original be equal? PS: Sorry for my bad English. 回答1: I'm assuming with "normalized" you mean SASLpreped. In the case of "IDoMdGuFE9S0", the output is the same as the input (it is fully ASCII, with no control sequences or U+00AD ). If you're