diacritics | 易学教程

The encoding that Notepad++ just calls “ANSI”, does anyone know what to call it for Ruby?

阅读更多关于 The encoding that Notepad++ just calls “ANSI”, does anyone know what to call it for Ruby?

I have a bunch of .txt's that Notepad++ says (in its drop-down "Encoding" menu) are "ANSI". They have German characters in them, [äöüß], which display fine in Notepad++. But they don't show up right in irb when I File.read 'this is a German text example.txt' them. So does anyone know what argument I should give Encoding.default_external= ? (I'm assuming that'd be the solution, right?) When 'utf-8' or 'cp850' , it reads the "ANSI" file with "äöüß" in it as "\xE4\xF6\xFC\xDF"... (Please don't hesitate to mention apparently "obvious" things in your answers; I'm pretty much as newbish as you can

How can Z͎̠͗ͣḁ̵͙̑l͖͙̫̲̉̃ͦ̾͊ͬ̀g͔̤̞͓̐̓̒̽o͓̳͇̔ͥ text be prevented?

阅读更多关于 How can Z͎̠͗ͣḁ̵͙̑l͖͙̫̲̉̃ͦ̾͊ͬ̀g͔̤̞͓̐̓̒̽o͓̳͇̔ͥ text be prevented?

问题 I've read about how Zalgo text works, and I'm looking to learn how a chat or forum software could prevent that kind of annoyance. More precisely, what is the complete set of Unicode combining characters that needs to: a) either be stripped, assuming chat participants are to use only languages that don't require combining marks (i.e. you could write "fiancé" with a combining mark, but you'd be a bit Zalgo'ed yourself if you insisted on doing so); or, b) reduced to maximum 8 consecutive

removing accent and special characters [duplicate]

阅读更多关于 removing accent and special characters [duplicate]

Possible Duplicate: What is the best way to remove accents in a python unicode string? Python and character normalization I would like to remove accents, turn all characters to lowercase, and delete any numbers and special characters. Example : Frédér8ic@ --> frederic Proposal: def remove_accents(data): return ''.join(x for x in unicodedata.normalize('NFKD', data) if \ unicodedata.category(x)[0] == 'L').lower() Is there any better way to do this? Abhijit A possible solution would be def remove_accents(data): return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.printable

ToAscii/ToUnicode in a keyboard hook destroys dead keys

阅读更多关于 ToAscii/ToUnicode in a keyboard hook destroys dead keys

It seems that if you call ToAscii() or ToUnicode() while in a global WH_KEYBOARD_LL hook, and a dead-key is pressed, it will be 'destroyed'. For example, say you've configured your input language in Windows as Spanish, and you want to type an accented letter á in a program. Normally, you'd press the single-quote key (the dead key), then the letter "a", and then on the screen an accented á would be displayed, as expected. But this doesn't work if you call ToAscii() or ToUnicode() in a low-level keyboard hook function. It seems that the dead key is destroyed, and so no accented letter á shows up

MySQL - Why are COLLATION rules ignored by LIKE operator for German ß character

阅读更多关于 MySQL - Why are COLLATION rules ignored by LIKE operator for German ß character

I'm running the following select statements on MySQL 5.0.88 with utf8 charset and utf8_unicode_ci collation: SELECT * FROM table WHERE surname = 'abcß'; +----+-------------------+------+ | id | forename | surname | +----+-------------------+------+ | 1 | a | abcß | | 2 | b | abcss | +----+-------------+------------+ SELECT * FROM table WHERE surname LIKE 'abcß'; +----+-------------------+------+ | id | forename | surname | +----+-------------------+------+ | 1 | a | abcß | +----+-------------+------------+ According to http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html the german

Custom HTTP header value - trying to pass umlaut characters

阅读更多关于 Custom HTTP header value - trying to pass umlaut characters

问题 I am using Node.js and Express.js 3.x. As one of our authorization headers we are passing in the username. Some of our usernames contain umlaut characters: ü ö ä and the likes of. For usernames with just 'normal' characters, all works fine. But when a jörg tries to make a request, the server doesn't recognize the umlaut character in the header. Trying to simulate the problem I: Created some tests that set the username header with the umlaut character. These tests pass, they are able to pass

Regex accent insensitive?

阅读更多关于 Regex accent insensitive?

I need a Regex in a C# program. I've to capture a name of a file with a specific structure. I used the \w char class, but the problem is that this class doesn't match any accented char. Then how to do this? I just don't want to put the most used accented letter in my pattern because we can theoretically put every accent on every letter. So I though there is maybe a syntax, to say we want a case insensitive(or a class which takes in account accent), or a "Regex" option which allows me to be case insensitive. Do you know something like this? Thank you very much Case-insensite works for me in

Why does string.Compare seem to handle accented characters inconsistently?

阅读更多关于 Why does string.Compare seem to handle accented characters inconsistently?

If I execute the following statement: string.Compare("mun", "mün", true, CultureInfo.InvariantCulture) The result is '-1', indicating that 'mun' has a lower numeric value than 'mün'. However, if I execute this statement: string.Compare("Muntelier, Schweiz", "München, Deutschland", true, CultureInfo.InvariantCulture) I get '1', indicating that 'Muntelier, Schewiz' should go last. Is this a bug in the comparison? Or, more likely, is there a rule I should be taking into account when sorting strings containing accented The reason this is an issue is, I'm sorting a list and then doing a manual

python : working with german umlaut

阅读更多关于 python : working with german umlaut

months = ["Januar", "Februar", "März", "April", "Mai", "Juni", "Juli", "August", "September", "Oktober", "November", "Dezember"] print months[2].decode("utf-8") Printing month[2] fails with UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: invalid data help to get rid of this! Did you add an encoding in the begining of your source file ? # -*- coding: utf-8 -*- Are you sure you are working in UTF8? Nevertheless, I would recommend to define months = [u"Januar", u"Februar", u"März", u"April", u"Mai", u"Juni", u"Juli", u"August", u"September", u"Oktober", u"November", u

MacOSX: how to disable accented characters input

阅读更多关于 MacOSX: how to disable accented characters input

I'm using Eclipse Juno on MacOSX Lion and have an issue that drives me real mad (in Xcode and Appcode everything works ok). I often print one quote/apostrophe and move the caret. But in this Mac version of Eclipse the quote as I type is highlighted by orange marker (it seems like Mac smart quotes feature) and when I move caret - quote disappears! I tried defaults write NSGlobalDomain AutomaticQuoteSubstitutionEnabled -bool false to disable smart qotes globally, restarted the computer, but this doesn't help. Also I tried to find in Eclipse preferences something related to "quote", "smart",