Vectorized update to pandas DataFrame?

隐身守侯 提交于 2019-12-10 22:43:18

问题


I have a dataframe for which I'd like to update a column with some values from an array. The array is of a different lengths to the dataframe however, but I have the indices for the rows of the dataframe that I'd like to update.

I can do this with a loop through the rows (below) but I expect there is a much more efficient way to do this via a vectorized approach, but I can't seem to get the syntax correct.

In the example below I just fill the column with nan and then use the indices directly through a loop.

df['newcol'] = np.nan

j = 0
for i in update_idx:
    df['newcol'][i] = new_values[j]
    j+=1

回答1:


if you have a list of indices already then you can use loc to perform label (row) selection, you can pass the new column name, where your existing rows are not selected these will have NaN assigned:

df.loc[update_idx, 'new_col'] = new_value

Example:

In [4]:
df = pd.DataFrame({'a':np.arange(5), 'b':np.random.randn(5)}, index = list('abcde'))
df

Out[4]:
   a         b
a  0  1.800300
b  1  0.351843
c  2  0.278122
d  3  1.387417
e  4  1.202503

In [5]:    
idx_list = ['b','d','e']
df.loc[idx_list, 'c'] = np.arange(3)
df

Out[5]:
   a         b   c
a  0  1.800300 NaN
b  1  0.351843   0
c  2  0.278122 NaN
d  3  1.387417   1
e  4  1.202503   2


来源:https://stackoverflow.com/questions/34426247/vectorized-update-to-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!