Pandas - make a column dtype object or Factor

前端 未结 3 1486
北荒
北荒 2020-12-24 05:09

In pandas, how can I convert a column of a DataFrame into dtype object? Or better yet, into a factor? (For those who speak R, in Python, how do I as.factor()?)<

相关标签:
3条回答
  • 2020-12-24 05:25

    There's also pd.factorize function to use:

    # use the df data from @herrfz
    
    In [150]: pd.factorize(df.b)
    Out[150]: (array([0, 1, 0, 1, 2]), array(['yes', 'no', 'absent'], dtype=object))
    In [152]: df['c'] = pd.factorize(df.b)[0]
    
    In [153]: df
    Out[153]: 
       a       b  c
    0  1     yes  0
    1  2      no  1
    2  3     yes  0
    3  4      no  1
    4  5  absent  2
    
    0 讨论(0)
  • 2020-12-24 05:27

    You can use the astype method to cast a Series (one column):

    df['col_name'] = df['col_name'].astype(object)
    

    Or the entire DataFrame:

    df = df.astype(object)
    

    Update

    Since version 0.15, you can use the category datatype in a Series/column:

    df['col_name'] = df['col_name'].astype('category')
    

    Note: pd.Factor was been deprecated and has been removed in favor of pd.Categorical.

    0 讨论(0)
  • 2020-12-24 05:31

    Factor and Categorical are the same, as far as I know. I think it was initially called Factor, and then changed to Categorical. To convert to Categorical maybe you can use pandas.Categorical.from_array, something like this:

    In [27]: df = pd.DataFrame({'a' : [1, 2, 3, 4, 5], 'b' : ['yes', 'no', 'yes', 'no', 'absent']})
    
    In [28]: df
    Out[28]: 
       a       b
    0  1     yes
    1  2      no
    2  3     yes
    3  4      no
    4  5  absent
    
    In [29]: df['c'] = pd.Categorical.from_array(df.b).labels
    
    In [30]: df
    Out[30]: 
       a       b  c
    0  1     yes  2
    1  2      no  1
    2  3     yes  2
    3  4      no  1
    4  5  absent  0
    
    0 讨论(0)
提交回复
热议问题