diacritics | 易学教程

Is there a way to use NSString stringByFoldingWithOptions to unfold the single French 'œ' character into 'oe'?

阅读更多关于 Is there a way to use NSString stringByFoldingWithOptions to unfold the single French 'œ' character into 'oe'?

For a diacritics-agnostic full text search feature, I use the following code to convert accented characters like é or Ö into their lowercase non-accented form e and o [[inputString stringByFoldingWithOptions: NSCaseInsensitiveSearch + NSDiacriticInsensitiveSearch + NSWidthInsensitiveSearch locale: [NSLocale currentLocale]] lowercaseString]; This works. However, I found no way to convert special characters whose base form consists of multiple characters like the French œ (as in "sœur") or the German ß (as in 'Fluß'). I would like to convert them into oe and ss respectively. I found no flag for

MongoDB diacriticInSensitive search not showing all accented (words with diacritic mark) rows as expected and vice-versa

阅读更多关于 MongoDB diacriticInSensitive search not showing all accented (words with diacritic mark) rows as expected and vice-versa

I have a document collection with following structure uid, name With a Index db.Collection.createIndex({name: "text"}) It contains following data 1, iphone 2, iphóne 3, iphonë 4, iphónë When I am doing text search for iphone I am getting only two records, which is unexpected actual output -------------- 1, iphone 2, iphóne If I search for iphonë db.Collection.find( { $text: { $search: "iphonë"} } ); I am getting --------------------- 3, iphonë 4, iphónë But Actually I am expecting following output db.Collection.find( { $text: { $search: "iphone"} } ); db.Collection.find( { $text: { $search:

Python: Convert Unicode to ASCII without errors for CSV file

阅读更多关于 Python: Convert Unicode to ASCII without errors for CSV file

I've been reading all questions regarding conversion from Unicode to CSV in Python here in StackOverflow and I'm still lost. Everytime I receive a "UnicodeEncodeError: 'ascii' codec can't encode character u'\xd1' in position 12: ordinal not in range(128)" buffer=cStringIO.StringIO() writer=csv.writer(buffer, csv.excel) cr.execute(query, query_param) while (1): row = cr.fetchone() writer.writerow([s.encode('ascii','ignore') for s in row]) The value of row is (56, u"LIMPIADOR BA\xd1O 1'5 L") where the value of \xd10 at the database is ñ, a n with a diacritical tilde used in Spanish. At first I

How can I make a regular expression which takes accented characters into account?

阅读更多关于 How can I make a regular expression which takes accented characters into account?

I have a JavaScript regular expression which basically finds two-letter words. The problem seems to be that it interprets accented characters as word boundaries. Indeed, it seems that A word boundary ("\b") is a spot between two characters that has a "\w" on one side of it and a "\W" on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a "\W". AS3 RegExp to match words with boundry type characters in them And since \w matches any alphanumerical character (word characters) including underscore (short for [a-zA-Z0-9_]).

accent insensitive regex

阅读更多关于 accent insensitive regex

My code: jQuery.fn.extend({ highlight: function(search){ var regex = new RegExp('(<[^>]*>)|('+ search.replace(/[.+]i/,"$0") +')','ig'); return this.html(this.html().replace(regex, function(a, b, c){ return (a.charAt(0) == '<') ? a : '<strong class="highlight">' + c + '</strong>'; })); } }); I want to highlight letters with accents, ie: $('body').highlight("cao"); should highlight: [ção] OR [çÃo] OR [cáo] OR expre[cão]tion OR [Cáo]tion How can I do that? The sole correct way to do this is to first run it through Unicode Normalization Form D , canonical decomposition. You then strip our any

MySQL REGEXP query - accent insensitive search

阅读更多关于 MySQL REGEXP query - accent insensitive search

I'm looking to query a database of wine names, many of which contain accents (but not in a uniform way, and so similar wines may be entered with or without accents) The basic query looks like this: SELECT * FROM `table` WHERE `wine_name` REGEXP '[[:<:]]Faugères[[:>:]]' which will return entries with 'Faugères' in the title, but not 'Faugeres' SELECT * FROM `table` WHERE `wine_name` REGEXP '[[:<:]]Faugeres[[:>:]]' does the opposite. I had thought something like: SELECT * FROM `table` WHERE `wine_name` REGEXP '[[:<:]]Faug[eèêéë]r[eèêéë]s[[:>:]]' might do the trick, but this only returns the

WPF WebBrowser and special characters like german “umlaute”

阅读更多关于 WPF WebBrowser and special characters like german “umlaute”

I use the WPF WebBrowser Control in my app. I have a file (mht) which contains german umlaute (ä ö ü). Now, I load this this file with .Navigate(path) but the Problem is, that this charactes are not shown correct. How can I solve this? Best Regards, Thomas Gavin Jones This is very quirky. My solution was to put an explicit meta tag in my HTML file - "My Page.html" <meta http-equiv='Content-Type' content='text/html;charset=UTF-8'> Then using the standard Web Browser .NET control I then created a URI object first. webBrowser1.Url = new Uri("My Page.html"); Then draw the page using the refresh

Code to strip diacritical marks using ICU

阅读更多关于 Code to strip diacritical marks using ICU

Can somebody please provide some sample code to strip diacritical marks (i.e., replace characters having accents, umlauts, etc., with their unaccented, unumlauted, etc., character equivalents, e.g., every accented é would become a plain ASCII e ) from a UnicodeString using the ICU library in C++? E.g.: UnicodeString strip_diacritics( UnicodeString const &s ) { UnicodeString result; // ... return result; } Assume that s has already been normalized. Thanks. ICU lets you transliterate a string using a specific rule. My rule is NFD; [:M:] Remove; NFC : decompose, remove diacritics, recompose. The

Should all accented characters use html entities?

阅读更多关于 Should all accented characters use html entities?

I am working with a large number of HTML files that are mostly encoded as utf-8. There are accented characters galore as many are in French. I have been converting them to HTML entities as I go, but I noticed that even in IE5.5 (according IE tester) the nonconverted accented characters are displaying properly. Should I be concerned with character display and convert them all to HTML entities just to be on the safe side? If the files are UTF-8 encoded, you should set the Content-Type header to be text/html; charset=UTF-8 and have an equivalent meta tag on the page: <meta http-equiv="Content

PHP convert foreign characters with accents

阅读更多关于 PHP convert foreign characters with accents

Hi I'm trying to compare some text to the text in a database.. in the database any text with an accent is encoded like in html (ie. é) when I compare the database text to my string it doesn't match because my string just shows é .. when I use the php function htmlentities to encode the string first the é turns into Ã© weird? using htmlspecialchars doesn't encode the é at all.. how would you suggest I compare é to é as well as all the other accented characters? You need to send in the correct charset to htmlentities. It looks like you're using UTF-8, but the default is ISO-8859-1. Change it

订阅 diacritics