diacritics

The encoding that Notepad++ just calls “ANSI”, does anyone know what to call it for Ruby?

你离开我真会死。 提交于 2019-11-29 01:09:58
I have a bunch of .txt's that Notepad++ says (in its drop-down "Encoding" menu) are "ANSI". They have German characters in them, [äöüß], which display fine in Notepad++. But they don't show up right in irb when I File.read 'this is a German text example.txt' them. So does anyone know what argument I should give Encoding.default_external= ? (I'm assuming that'd be the solution, right?) When 'utf-8' or 'cp850' , it reads the "ANSI" file with "äöüß" in it as "\xE4\xF6\xFC\xDF"... (Please don't hesitate to mention apparently "obvious" things in your answers; I'm pretty much as newbish as you can

How can Z͎̠͗ͣḁ̵͙̑l͖͙̫̲̉̃ͦ̾͊ͬ̀g͔̤̞͓̐̓̒̽o͓̳͇̔ͥ text be prevented?

▼魔方 西西 提交于 2019-11-29 00:57:18
问题 I've read about how Zalgo text works, and I'm looking to learn how a chat or forum software could prevent that kind of annoyance. More precisely, what is the complete set of Unicode combining characters that needs to: a) either be stripped, assuming chat participants are to use only languages that don't require combining marks (i.e. you could write "fiancé" with a combining mark, but you'd be a bit Zalgo'ed yourself if you insisted on doing so); or, b) reduced to maximum 8 consecutive

removing accent and special characters [duplicate]

匆匆过客 提交于 2019-11-28 23:49:48
Possible Duplicate: What is the best way to remove accents in a python unicode string? Python and character normalization I would like to remove accents, turn all characters to lowercase, and delete any numbers and special characters. Example : Frédér8ic@ --> frederic Proposal: def remove_accents(data): return ''.join(x for x in unicodedata.normalize('NFKD', data) if \ unicodedata.category(x)[0] == 'L').lower() Is there any better way to do this? Abhijit A possible solution would be def remove_accents(data): return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.printable

ToAscii/ToUnicode in a keyboard hook destroys dead keys

☆樱花仙子☆ 提交于 2019-11-28 23:30:24
It seems that if you call ToAscii() or ToUnicode() while in a global WH_KEYBOARD_LL hook, and a dead-key is pressed, it will be 'destroyed'. For example, say you've configured your input language in Windows as Spanish, and you want to type an accented letter á in a program. Normally, you'd press the single-quote key (the dead key), then the letter "a", and then on the screen an accented á would be displayed, as expected. But this doesn't work if you call ToAscii() or ToUnicode() in a low-level keyboard hook function. It seems that the dead key is destroyed, and so no accented letter á shows up

MySQL - Why are COLLATION rules ignored by LIKE operator for German ß character

强颜欢笑 提交于 2019-11-28 12:06:17
I'm running the following select statements on MySQL 5.0.88 with utf8 charset and utf8_unicode_ci collation: SELECT * FROM table WHERE surname = 'abcß'; +----+-------------------+------+ | id | forename | surname | +----+-------------------+------+ | 1 | a | abcß | | 2 | b | abcss | +----+-------------+------------+ SELECT * FROM table WHERE surname LIKE 'abcß'; +----+-------------------+------+ | id | forename | surname | +----+-------------------+------+ | 1 | a | abcß | +----+-------------+------------+ According to http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html the german

Custom HTTP header value - trying to pass umlaut characters

天大地大妈咪最大 提交于 2019-11-28 10:47:56
问题 I am using Node.js and Express.js 3.x. As one of our authorization headers we are passing in the username. Some of our usernames contain umlaut characters: ü ö ä and the likes of. For usernames with just 'normal' characters, all works fine. But when a jörg tries to make a request, the server doesn't recognize the umlaut character in the header. Trying to simulate the problem I: Created some tests that set the username header with the umlaut character. These tests pass, they are able to pass

Regex accent insensitive?

為{幸葍}努か 提交于 2019-11-28 10:06:49
I need a Regex in a C# program. I've to capture a name of a file with a specific structure. I used the \w char class, but the problem is that this class doesn't match any accented char. Then how to do this? I just don't want to put the most used accented letter in my pattern because we can theoretically put every accent on every letter. So I though there is maybe a syntax, to say we want a case insensitive(or a class which takes in account accent), or a "Regex" option which allows me to be case insensitive. Do you know something like this? Thank you very much Case-insensite works for me in

Why does string.Compare seem to handle accented characters inconsistently?

a 夏天 提交于 2019-11-28 09:51:10
If I execute the following statement: string.Compare("mun", "mün", true, CultureInfo.InvariantCulture) The result is '-1', indicating that 'mun' has a lower numeric value than 'mün'. However, if I execute this statement: string.Compare("Muntelier, Schweiz", "München, Deutschland", true, CultureInfo.InvariantCulture) I get '1', indicating that 'Muntelier, Schewiz' should go last. Is this a bug in the comparison? Or, more likely, is there a rule I should be taking into account when sorting strings containing accented The reason this is an issue is, I'm sorting a list and then doing a manual

python : working with german umlaut

只愿长相守 提交于 2019-11-28 09:49:24
months = ["Januar", "Februar", "März", "April", "Mai", "Juni", "Juli", "August", "September", "Oktober", "November", "Dezember"] print months[2].decode("utf-8") Printing month[2] fails with UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: invalid data help to get rid of this! Did you add an encoding in the begining of your source file ? # -*- coding: utf-8 -*- Are you sure you are working in UTF8? Nevertheless, I would recommend to define months = [u"Januar", u"Februar", u"März", u"April", u"Mai", u"Juni", u"Juli", u"August", u"September", u"Oktober", u"November", u

MacOSX: how to disable accented characters input

隐身守侯 提交于 2019-11-28 08:54:17
I'm using Eclipse Juno on MacOSX Lion and have an issue that drives me real mad (in Xcode and Appcode everything works ok). I often print one quote/apostrophe and move the caret. But in this Mac version of Eclipse the quote as I type is highlighted by orange marker (it seems like Mac smart quotes feature) and when I move caret - quote disappears! I tried defaults write NSGlobalDomain AutomaticQuoteSubstitutionEnabled -bool false to disable smart qotes globally, restarted the computer, but this doesn't help. Also I tried to find in Eclipse preferences something related to "quote", "smart",