Convert array of string (category) to array of int from a pandas dataframe

前端 未结 4 1606
孤城傲影
孤城傲影 2020-12-14 11:25

I am trying to do something very similar to that previous question but I get an error. I have a pandas dataframe containing features,label I need to do some convertion to se

4条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-14 12:06

    The previous answers are outdated, so here is a solution for mapping strings to numbers that works with version 0.18.1 of Pandas.

    For a Series:

    In [1]: import pandas as pd
    In [2]: s = pd.Series(['single', 'touching', 'nuclei', 'dusts',
                           'touching', 'single', 'nuclei'])
    In [3]: s_enc = pd.factorize(s)
    In [4]: s_enc[0]
    Out[4]: array([0, 1, 2, 3, 1, 0, 2])
    In [5]: s_enc[1]
    Out[5]: Index([u'single', u'touching', u'nuclei', u'dusts'], dtype='object')
    

    For a DataFrame:

    In [1]: import pandas as pd
    In [2]: df = pd.DataFrame({'labels': ['single', 'touching', 'nuclei', 
                           'dusts', 'touching', 'single', 'nuclei']})
    In [3]: catenc = pd.factorize(df['labels'])
    In [4]: catenc
    Out[4]: (array([0, 1, 2, 3, 1, 0, 2]), 
            Index([u'single', u'touching', u'nuclei', u'dusts'],
            dtype='object'))
    In [5]: df['labels_enc'] = catenc[0]
    In [6]: df
    Out[4]:
             labels  labels_enc
        0    single           0
        1  touching           1
        2    nuclei           2
        3     dusts           3
        4  touching           1
        5    single           0
        6    nuclei           2
    

提交回复
热议问题