diacritics

Converting combining diacritics to simple utf

試著忘記壹切 提交于 2019-12-08 20:21:18
问题 I have a problem when inserting a string to database due to some encoding issues. String source is a external rss feed. In web browser it looks ok. Even in debugger the text appears to be ok. If I copy the strong to notedpad, the result is also ok. But in notepad++ was possible to see that string is using combining characters. If changing to ansii, both combined appears. e.g. á is displayed as a´ (In notepad++ is is like having two chars, on over the other. I even can select ... half of the

Python 3 regex with diacritics and ligatures,

拈花ヽ惹草 提交于 2019-12-08 17:33:46
问题 Names in the form: Ceasar, Julius are to be split into First_name Julius Surname Ceasar. Names may contain diacritics (á à é ..), and ligatures (æ, ø) This code seems to work OK in Python 3.3 import re def doesmatch(pat, str): try: yup = re.search(pat, str) print('Firstname {0} lastname {1}'.format(yup.group(2), yup.group(1))) except AttributeError: print('no match for {0}'.format(str)) s = 'Révèrberë, Harry' t = 'Åapö, Renée' u = 'C3po, Robby' v = 'Mærsk, Efraïm' w = 'MacDønald, Ron' x =

python: open and read a file containing germanic umlaut as unicode

随声附和 提交于 2019-12-08 10:00:44
问题 I have written my program to read words from a text file and enter them in sqlite database and also treat it as string. But I need to enter some words containing Germanic umlates: äöüß. Here is a prepared piece of code: I treid both with # - - coding: iso-8859-15 - - and # - - coding: utf-8 - - No difference(!) # -*- coding: iso-8859-15 -*- import sqlite3 dbname = 'sampledb.db' filename ='text.txt' con = sqlite3.connect(dbname) cur = con.cursor() cur.execute('''create table IF NOT EXISTS

How to handle Combining Diacritical Marks with UnicodeUtils?

笑着哭i 提交于 2019-12-08 07:44:56
问题 I am trying to insert spaces into a string of IPA characters, e.g. to turn ɔ̃wɔ̃tɨ into ɔ̃ w ɔ̃ t ɨ . Using split/join was my first thought: s = ɔ̃w̃ɔtɨ s.split('').join(' ') #=> ̃ ɔ w ̃ ɔ p t ɨ As I discovered by examining the results, letters with diacritics are in fact encoded as two characters. After some research I found the UnicodeUtils module, and used the each_grapheme method: UnicodeUtils.each_grapheme(s) {|g| g + ' '} #=> ɔ ̃w ̃ɔ p t ɨ This worked fine, except for the inverted breve

How to replace accented characters by their HTML representation

我的未来我决定 提交于 2019-12-08 06:38:28
问题 I would like to transform strings like "rég" to "grégou". I temporarily wrote some code that manually changes the most common accents, but I would like to get one that transforms each accent to its html equivalent. Someone has an idea? :) ps: I tried something but it does not work ... C # code: public static MvcHtmlString MyEncode(this HtmlHelper htmlHelper, string text) { StringBuilder builder = new StringBuilder(); Byte[] bArray; HttpUtility.HtmlEncode(text); bArray = System.Text.Encoding

German characters display in TextView

a 夏天 提交于 2019-12-08 06:07:06
问题 In my Android application I am trying to display German Text. ö ä ü ß those characters are unable to display in TextView . If anyone having idea how to set font or how to display characters let me know. The Data I am receiving from services. 回答1: I think you should read wikipedia: Character_encoding as well as The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) 回答2: My problem solved I add DocumentBuilder db = dbf

Chrome and Safari handles utf 8 differently then Firefox and IE if utf 8 character not hard coded issue?

跟風遠走 提交于 2019-12-08 04:09:30
问题 note: question title is change as discussed in this meta Q&A I'm using the jQuery bassistance autocomplete plugin with the accent plugin, so I have accent-free autocomplete. Accent map is like this: map={'À':'A', 'İ':'I'}; I have problems with the character İ (turkish uppercase I with point). After I remove accents and convert to lowercase, I have this code: ("İstanbul").indexOf("is") Firefox and IE gives 0 , but Chrome and Safari gives -1 . charCodeAt(0) gives the same result in all browsers

swift remove diacritic from Arabic

蓝咒 提交于 2019-12-08 00:08:41
问题 I am trying to remove the Arabic text diacritic. For example I need to convert this َب to this ب , here is my code : if (text != "") { for char in text! { print(char) print(char.unicodeScalars.first?.value) if allowed.contains("\(char)"){ newText.append(char) } } self.textView.text = text! } else { // TODO : // show an alert print("uhhh no way") } I have tried these solutions but with no luck : How to remove diacritics from a String in Swift? NSString : easy way to remove UTF-8 accents from a

Umlauts in gnuplot command line

拜拜、爱过 提交于 2019-12-07 22:58:31
问题 I want to plot surface data in gnuplot (I'm new with gnuplot and found nothing in the docs or via google that worked). It works pretty ok for the beginning with Splot "heightfield.dat" . The problem I have is the path to the file. It contains Umlauts (contains a Ü ). I can not change into this folder with cd or plot with a path like this. Of course I just changed the Ü to a U to work, but this is kind of a dirty hack. The problem is that I can not even type it. When I type Ü it is replaced

python raw_input odd behavior with accents containing strings

亡梦爱人 提交于 2019-12-07 16:53:58
问题 I'm writing a program that asks the user for input that contains accents. The user input string is tested to see if it matches a string declared in the program. As you can see below, my code is not working: code # -*- coding: utf-8 -*- testList = ['má'] myInput = raw_input('enter something here: ') print myInput, repr(myInput) print testList[0], repr(testList[0]) print myInput in testList output in eclipse with pydev enter something here: má m√° 'm\xe2\x88\x9a\xc2\xb0' má 'm\xc3\xa1' False