How to remove accents from values in columns?

前端 未结 4 2102
-上瘾入骨i
-上瘾入骨i 2020-12-13 07:19

How do I change the special characters to the usual alphabet letters? This is my dataframe:

In [56]: cities
Out[56]:

Table Code  Country         Year                


        
相关标签:
4条回答
  • 2020-12-13 07:56

    Use this code:

    df['Country'] = df['Country'].str.replace(u"Å", "A")
    df['City'] = df['City'].str.replace(u"ë", "e")
    

    See here! Of course you should do it then for every special character and every column.

    0 讨论(0)
  • 2020-12-13 08:13

    This is for Python 2.7. For converting to ASCII you might want to try:

    import unicodedata
    
    unicodedata.normalize('NFKD', u"Durrës Åland Islands").encode('ascii','ignore')
    'Durres Aland Islands'
    
    0 讨论(0)
  • 2020-12-13 08:14

    The pandas method is to use the vectorised str.normalize combined with str.decode and str.encode:

    In [60]:
    df['Country'].str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')
    
    Out[60]:
    0    Aland Islands
    1    Aland Islands
    2          Albania
    3          Albania
    4          Albania
    Name: Country, dtype: object
    

    So to do this for all str dtypes:

    In [64]:
    cols = df.select_dtypes(include=[np.object]).columns
    df[cols] = df[cols].apply(lambda x: x.str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8'))
    df
    
    Out[64]:
       Table Code        Country    Year       City      Value
    0         240  Aland Islands  2014.0  MARIEHAMN  11437.0 1
    1         240  Aland Islands  2010.0  MARIEHAMN  5829.5  1
    2         240        Albania  2011.0     Durres   113249.0
    3         240        Albania  2011.0     TIRANA   418495.0
    4         240        Albania  2011.0     Durres    56511.0
    
    0 讨论(0)
  • 2020-12-13 08:15

    With pandas series example

    def remove_accents(a):
        return unidecode.unidecode(a.decode('utf-8'))
    
    df['column'] = df['column'].apply(remove_accents)
    

    in this case decode asciis

    0 讨论(0)
提交回复
热议问题