pandas.factorize on an entire data frame

后端 未结 3 1335
青春惊慌失措
青春惊慌失措 2020-12-07 15:13

pandas.factorize encodes input values as an enumerated type or categorical variable.

But how can I easily and efficiently convert many columns of a data frame? What

3条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-07 15:47

    I would like to redirect my answer: https://stackoverflow.com/a/32011969/1694714

    Old answer

    Another readable solution for this problem, when you want to keep the categories consistent across the the resulting DataFrame is using replace:

    def categorise(df):
        categories = {k: v for v, k in enumerate(df.stack().unique())}
        return df.replace(categories)
    

    Performs slightly worse than the example by @jezrael, but easier to read. Also, it might escalate better for bigger datasets. I can do some proper testing if anyone is interested.

提交回复
热议问题