Python: replace french letters with english

白昼怎懂夜的黑 提交于 2020-08-05 08:05:08

问题


Would like to replace all the french letters within words with their ASCII equivalent.

letters = [['é', 'à'], ['è', 'ù'], ['â', 'ê'], ['î', 'ô'], ['û', 'ç']]

for x in letters:
   for a in x:
        a = a.replace('é', 'e')
        a = a.replace('à', 'a')
        a = a.replace('è', 'e')
        a = a.replace('ù', 'u')
        a = a.replace('â', 'a')
        a = a.replace('ê', 'e')
        a = a.replace('î', 'i')
        a = a.replace('ô', 'o')
        a = a.replace('û', 'u')
        a = a.replace('ç', 'c')

print letters[0][0]

This code prints é however. How can I make this work?


回答1:


May I suggest you consider using translation tables.

translationTable = str.maketrans("éàèùâêîôûç", "eaeuaeiouc")

test = "Héllô Càèùverâêt Jîôûç"
test = test.translate(translationTable)
print(test)

will print Hello Caeuveraet Jiouc. Pardon my French.




回答2:


You can also use unidecode. Install it : pip install unidecode.
Then, do:

from unidecode import unidecode

s = "Héllô Càèùverâêt Jîôûç ïîäüë"
s = unidecode(s)
print(s)  # Hello Caeuveraet Jiouc iiaue

The result will be the same string, but the french characters will be converted to their ASCII equivalent: Hello Caeuveraet Jiouc iiaue




回答3:


The replace function returns the string with the character replaced.

In your code you don't store this return value.

The lines in your loop should be a = a.replace('é', 'e').

You also need to store that output so you can print it in the end.

e: This post explains how variables within loops are accessed




回答4:


Here's another solution, using the low level unicode package called unicodedata.

In the unicode structure, a character like 'ô' is actually a composite character, made of the character 'o' and another character called 'COMBINING GRAVE ACCENT', which is basically the '̀'. Using the method decomposition in unicodedata, one can obtain the unicodes (in hex) of these two parts.

>>> import unicodedata as ud
>>> ud.decomposition('ù')
'0075 0300'
>>> chr(0x0075)
'u'
>>> >>> chr(0x0300)
'̀'

Therefore, to retrieve 'u' from 'ù', we can first do a string split, then use the built-in int function for the conversion(see this thread for converting a hex string to an integer), and then get the character using chr function.

import unicodedata as ud

def get_ascii_char(c):
    s = ud.decomposition(c)
    if s == '': # for an indecomposable character, it returns ''
        return c
    code = int('0x' + s.split()[0], 0)
    return chr(code)

I'm new to the unicode representation and utilities in python. If anyone has any suggestion to improving this piece of codes, I'll be very happy to learn that!

Cheers!



来源:https://stackoverflow.com/questions/41004941/python-replace-french-letters-with-english

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!