diacritics

Save Accents in MySQL Database

限于喜欢 提交于 2019-11-28 06:20:40
I'm trying to save French accents in my database, but they aren't saved like they should in the DB. For example, a "é" is saved as "é". I've tried to set my files to "Unicode (utf-8)", the fields in the DB are "utf8_general_ci" as well as the DB itself. When I look at my data posted through AJAX with Firebug, I see the accent passed as "é", so it's correct. Thanks and let me know you need more info! Olivier Kaisin Personally I solved the same issue by adding after the MySQL connection code : mysql_set_charset("utf8"); or for mysqli: mysqli_set_charset($conn, "utf8"); or the mysqli OOP

Is there a way to use NSString stringByFoldingWithOptions to unfold the single French 'œ' character into 'oe'?

爱⌒轻易说出口 提交于 2019-11-28 05:10:47
问题 For a diacritics-agnostic full text search feature, I use the following code to convert accented characters like é or Ö into their lowercase non-accented form e and o [[inputString stringByFoldingWithOptions: NSCaseInsensitiveSearch + NSDiacriticInsensitiveSearch + NSWidthInsensitiveSearch locale: [NSLocale currentLocale]] lowercaseString]; This works. However, I found no way to convert special characters whose base form consists of multiple characters like the French œ (as in "sœur") or the

How to protect against diacritics such as Zalgo text

狂风中的少年 提交于 2019-11-28 04:20:47
The character pictured above was tweeted a few months ago by Mikko Hyppönen , a computer security expert known for his work on computer viruses and TED talks on computer security. In respect for SO, I will only post an image of it, but you get the idea. It's obviously not something you'd want spreading around your website and freaking out visitors. Upon further inspection, the character appears to be a letter of the Thai alphabet combined with over 87 diacritics (is there even a limit?!). This got me thinking about security, localization, and how one might handle this sort of input. My

Python: Convert Unicode to ASCII without errors for CSV file

本小妞迷上赌 提交于 2019-11-28 04:05:09
问题 I've been reading all questions regarding conversion from Unicode to CSV in Python here in StackOverflow and I'm still lost. Everytime I receive a "UnicodeEncodeError: 'ascii' codec can't encode character u'\xd1' in position 12: ordinal not in range(128)" buffer=cStringIO.StringIO() writer=csv.writer(buffer, csv.excel) cr.execute(query, query_param) while (1): row = cr.fetchone() writer.writerow([s.encode('ascii','ignore') for s in row]) The value of row is (56, u"LIMPIADOR BA\xd1O 1'5 L")

How can I make a regular expression which takes accented characters into account?

被刻印的时光 ゝ 提交于 2019-11-28 04:04:35
问题 I have a JavaScript regular expression which basically finds two-letter words. The problem seems to be that it interprets accented characters as word boundaries. Indeed, it seems that A word boundary ("\b") is a spot between two characters that has a "\w" on one side of it and a "\W" on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a "\W". AS3 RegExp to match words with boundry type characters in them And since \w

accent insensitive regex

僤鯓⒐⒋嵵緔 提交于 2019-11-28 03:24:08
问题 My code: jQuery.fn.extend({ highlight: function(search){ var regex = new RegExp('(<[^>]*>)|('+ search.replace(/[.+]i/,"$0") +')','ig'); return this.html(this.html().replace(regex, function(a, b, c){ return (a.charAt(0) == '<') ? a : '<strong class="highlight">' + c + '</strong>'; })); } }); I want to highlight letters with accents, ie: $('body').highlight("cao"); should highlight: [ção] OR [çÃo] OR [cáo] OR expre[cão]tion OR [Cáo]tion How can I do that? 回答1: The sole correct way to do this is

WPF WebBrowser and special characters like german “umlaute”

我只是一个虾纸丫 提交于 2019-11-28 02:17:32
问题 I use the WPF WebBrowser Control in my app. I have a file (mht) which contains german umlaute (ä ö ü). Now, I load this this file with .Navigate(path) but the Problem is, that this charactes are not shown correct. How can I solve this? Best Regards, Thomas 回答1: This is very quirky. My solution was to put an explicit meta tag in my HTML file - "My Page.html" <meta http-equiv='Content-Type' content='text/html;charset=UTF-8'> Then using the standard Web Browser .NET control I then created a URI

What are the unicode ranges for Hindi accented characters?

谁说我不能喝 提交于 2019-11-28 01:22:48
I'm trying to gather a Unicode list of all the 'o' like shapes in the Hindi character-set. In fact, a list of any characters (in any language) that makes uses of separate characters to indicate an accent would be better. I intend to use this unicode-list in a RegExp. I been trying to edit a list of character-ranges by outputting them in an Input TextField, but editing this text causes weird issues (the keyboard-cursor isn't place on the correct character, selections suddenly dissappear / incorrectly warps... in other words... HINDI HELL!) I've tried this with Notepad++ too, but although it was

Code to strip diacritical marks using ICU

折月煮酒 提交于 2019-11-28 01:18:21
问题 Can somebody please provide some sample code to strip diacritical marks (i.e., replace characters having accents, umlauts, etc., with their unaccented, unumlauted, etc., character equivalents, e.g., every accented é would become a plain ASCII e ) from a UnicodeString using the ICU library in C++? E.g.: UnicodeString strip_diacritics( UnicodeString const &s ) { UnicodeString result; // ... return result; } Assume that s has already been normalized. Thanks. 回答1: ICU lets you transliterate a

Should all accented characters use html entities?

允我心安 提交于 2019-11-27 23:45:14
问题 I am working with a large number of HTML files that are mostly encoded as utf-8. There are accented characters galore as many are in French. I have been converting them to HTML entities as I go, but I noticed that even in IE5.5 (according IE tester) the nonconverted accented characters are displaying properly. Should I be concerned with character display and convert them all to HTML entities just to be on the safe side? 回答1: If the files are UTF-8 encoded, you should set the Content-Type