diacritics

Should I use accented characters in URLs?

随声附和 提交于 2019-11-27 06:57:13
When one creates web content in languages different than English the problem of search engine optimized and user friendly URLs emerge. I'm wondering whether it is the best practice to use de-accented letters in URLs -- risking that some words have completely different meanings with and without certain accents -- or it is better to stick to the usage of non-english characters where appropriate sacrificing the readability of those URLs in less advanced environments (e.g. MSIE, view source). "Exotic" letters could appear anywhere: in titles of documents, in tags, in user names, etc, so they're

Why can't I use accented characters next to a word boundary?

筅森魡賤 提交于 2019-11-27 06:51:26
问题 I'm trying to make a dynamic regex that matches a person's name. It works without problems on most names, until I ran into accented characters at the end of the name. Example: Some Fancy Namé The regex I've used so far is: /\b(Fancy Namé|Namé)\b/i Used like this: "Goal: Some Fancy Namé. Awesome.".replace(/\b(Fancy Namé|Namé)\b/i, '<a href="#">$1</a>'); This simply won't match. If I replace the é with a e, it matches just fine. If I try to match a name such as "Some Fancy Naméa", it works just

MySQL - Why are COLLATION rules ignored by LIKE operator for German ß character

烈酒焚心 提交于 2019-11-27 06:50:18
问题 I'm running the following select statements on MySQL 5.0.88 with utf8 charset and utf8_unicode_ci collation: SELECT * FROM table WHERE surname = 'abcß'; +----+-------------------+------+ | id | forename | surname | +----+-------------------+------+ | 1 | a | abcß | | 2 | b | abcss | +----+-------------+------------+ SELECT * FROM table WHERE surname LIKE 'abcß'; +----+-------------------+------+ | id | forename | surname | +----+-------------------+------+ | 1 | a | abcß | +----+-------------

Convert accented characters to their plain ascii equivalents

一曲冷凌霜 提交于 2019-11-27 05:22:46
问题 I have to convert french characters into english on my php. I've used the following code: iconv("utf-8", "ascii//TRANSLIT", $string); But the result for ËËË was "E"E"E . I don't need that double quote and other extra characters - I want to show an output like EEE . Is there any other method to convert french to english? Can you help me to do this? 回答1: The PHP Manual iconv Intro has a warning: Note that the iconv function on some systems may not work as you expect. In such case, it'd be a

Why does string.Compare seem to handle accented characters inconsistently?

谁说胖子不能爱 提交于 2019-11-27 03:15:03
问题 If I execute the following statement: string.Compare("mun", "mün", true, CultureInfo.InvariantCulture) The result is '-1', indicating that 'mun' has a lower numeric value than 'mün'. However, if I execute this statement: string.Compare("Muntelier, Schweiz", "München, Deutschland", true, CultureInfo.InvariantCulture) I get '1', indicating that 'Muntelier, Schewiz' should go last. Is this a bug in the comparison? Or, more likely, is there a rule I should be taking into account when sorting

python : working with german umlaut

我们两清 提交于 2019-11-27 03:14:19
问题 months = ["Januar", "Februar", "März", "April", "Mai", "Juni", "Juli", "August", "September", "Oktober", "November", "Dezember"] print months[2].decode("utf-8") Printing month[2] fails with UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: invalid data help to get rid of this! 回答1: Did you add an encoding in the begining of your source file ? # -*- coding: utf-8 -*- 回答2: Are you sure you are working in UTF8? Nevertheless, I would recommend to define months = [u"Januar", u

How to remove diacritics from text?

烂漫一生 提交于 2019-11-27 01:12:47
I am making a swedish website, and swedish letters are å, ä, and ö. I need to make a string entered by a user to become url-safe with PHP. Basically, need to convert all characters to underscore, all EXCEPT these: A-Z, a-z, 1-9 and all swedish should be converted like this: 'å' to 'a' and 'ä' to 'a' and 'ö' to 'o' (just remove the dots above). The rest should become underscores as I said. Im not good at regular expressions so I would appreciate the help guys! Thanks NOTE: NOT URLENCODE...I need to store it in a database... etc etc, urlencode wont work for me. Jeremy L // normalize data (remove

How to change diacritic characters to non-diacritic ones [duplicate]

半世苍凉 提交于 2019-11-27 01:10:01
This question already has an answer here: How do I remove diacritics (accents) from a string in .NET? 19 answers I've found a answer how to remove diacritic characters on stackoverflow, but could you please tell me if it is possible to change diacritic characters to non-diacritic ones? Oh.. and I think about .NET (or other if not possible) CesarB Copying from my own answer to another question : Instead of creating your own table, you could instead convert the text to normalization form D, where the characters are represented as a base character plus the diacritics (for instance, "á" will be

Java string searching ignoring accents

微笑、不失礼 提交于 2019-11-27 01:02:08
I am trying to write a filter function for my application that will take an input string and filter out all objects that don't match the given input in some way. The easiest way to do this would be to use String's contains method, i.e. just check if the object (the String variable in the object) contains the string specified in the filter, but this won't account for accents. The objects in question are basically Persons, and the strings I am trying to match are names. So for example if someone searches for Joao I would expect Joáo to be included in the result set. I have already used the

How to protect against diacritics such as Zalgo text

假装没事ソ 提交于 2019-11-27 00:21:44
问题 The character pictured above was tweeted a few months ago by Mikko Hyppönen, a computer security expert known for his work on computer viruses and TED talks on computer security. In respect for SO, I will only post an image of it, but you get the idea. It's obviously not something you'd want spreading around your website and freaking out visitors. Upon further inspection, the character appears to be a letter of the Thai alphabet combined with over 87 diacritics (is there even a limit?!). This