Convert categorical data in pandas dataframe

后端 未结 10 1880
予麋鹿
予麋鹿 2020-11-27 10:01

I have a dataframe with this type of data (too many columns):

col1        int64
col2        int64
col3        category
col4        category
col5        categ         


        
相关标签:
10条回答
  • 2020-11-27 10:26

    This works for me:

    pandas.factorize( ['B', 'C', 'D', 'B'] )[0]
    

    Output:

    [0, 1, 2, 0]
    
    0 讨论(0)
  • 2020-11-27 10:29

    You can do it less code like below :

    f = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':list('abcab'),'col3':list('ababb')})
    
    f['col1'] =f['col1'].astype('category').cat.codes
    f['col2'] =f['col2'].astype('category').cat.codes
    f['col3'] =f['col3'].astype('category').cat.codes
    
    f
    

    0 讨论(0)
  • 2020-11-27 10:32

    What I do is, I replace values.

    Like this-

    df['col'].replace(to_replace=['category_1', 'category_2', 'category_3'], value=[1, 2, 3], inplace=True)
    

    In this way, if the col column has categorical values, they get replaced by the numerical values.

    0 讨论(0)
  • 2020-11-27 10:34

    If your concern was only that you making a extra column and deleting it later, just dun use a new column at the first place.

    dataframe = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':list('abcab'),  'col3':list('ababb')})
    dataframe.col3 = pd.Categorical.from_array(dataframe.col3).codes
    

    You are done. Now as Categorical.from_array is deprecated, use Categorical directly

    dataframe.col3 = pd.Categorical(dataframe.col3).codes
    

    If you also need the mapping back from index to label, there is even better way for the same

    dataframe.col3, mapping_index = pd.Series(dataframe.col3).factorize()
    

    check below

    print(dataframe)
    print(mapping_index.get_loc("c"))
    
    0 讨论(0)
提交回复
热议问题