I just discovered the assign
method for pandas dataframes, and it looks nice and very similar to dplyr\'s mutate
in R. However, I\'ve always gotten
The difference concerns whether you wish to modify an existing frame, or create a new frame while maintaining the original frame as it was.
In particular, DataFrame.assign
returns you a new object that has a copy of the original data with the requested changes ... the original frame remains unchanged.
In your particular case:
>>> df = DataFrame({'A': range(1, 11), 'B': np.random.randn(10)})
Now suppose you wish to create a new frame in which A
is everywhere 1
without destroying df
. Then you could use .assign
>>> new_df = df.assign(A=1)
If you do not wish to maintain the original values, then clearly df["A"] = 1
will be more appropriate. This also explains the speed difference, by necessity .assign
must copy the data while [...]
does not.