Why use pandas.assign rather than simply initialize new column?

后端 未结 2 1574
余生分开走
余生分开走 2020-12-30 00:13

I just discovered the assign method for pandas dataframes, and it looks nice and very similar to dplyr\'s mutate in R. However, I\'ve always gotten

2条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-30 01:11

    The difference concerns whether you wish to modify an existing frame, or create a new frame while maintaining the original frame as it was.

    In particular, DataFrame.assign returns you a new object that has a copy of the original data with the requested changes ... the original frame remains unchanged.

    In your particular case:

    >>> df = DataFrame({'A': range(1, 11), 'B': np.random.randn(10)})
    

    Now suppose you wish to create a new frame in which A is everywhere 1 without destroying df. Then you could use .assign

    >>> new_df = df.assign(A=1)
    

    If you do not wish to maintain the original values, then clearly df["A"] = 1 will be more appropriate. This also explains the speed difference, by necessity .assign must copy the data while [...] does not.

提交回复
热议问题