问题
I am attempting to update the first N rows in a multi-index dataframe but was having a bit of trouble finding a solution so thought I'd create a post for it.
The example code is as follows:
# Imports
import numpy as np
import pandas as pd
# Set Up Data Frame
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D'])
df['DATE'] = dates
df['CATEGORY'] = ['A','B','A','B','A','B','A','B']
# Set Index
df.set_index(['CATEGORY','DATE'],inplace=True)
df.sort(inplace=True)
# Get First Two Rows of Each Category
df.groupby(level=0).apply(lambda x: x.iloc[0:2])
# Set The Value of Column 'C' Equal to Zero
# ???
So I was able to get as far as selecting the rows using "iloc", but after that I'm not sure how to set column "C" equal to zero. Feels like maybe I'm going about this the wrong way though. Any help would be greatly appreciated. Thanks!
回答1:
How about this - first define a function that takes a dataframe, and replaces the first x records with a specified value.
def replace_first_x(group_df, x, value):
group_df.iloc[:x, :] = value
return group_df
Then, pass that into the groupby
object with apply.
In [97]: df.groupby(level=0).apply(lambda df: replace_first_x(df, 2, 9999))
Out[97]:
A B C D
CATEGORY DATE
A 2000-01-01 9999.000000 9999.000000 9999.000000 9999.000000
2000-01-03 9999.000000 9999.000000 9999.000000 9999.000000
2000-01-05 1.590503 0.948911 -0.268071 0.622280
2000-01-07 -0.493866 1.222231 0.125037 0.071064
B 2000-01-02 9999.000000 9999.000000 9999.000000 9999.000000
2000-01-04 9999.000000 9999.000000 9999.000000 9999.000000
2000-01-06 1.663430 -1.170716 2.044815 -2.081035
2000-01-08 1.593104 0.108531 -1.381218 -0.517312
回答2:
Typically, whenever you have to change values, rather then just pick them, you cannot proceed using a lambda
function only, since these only allow selection.
A very boiled down way to proceed is
def replace_first(group):
group.iloc[0:2] = 99
return group
and then just do
In[144]: df.groupby(level=0).apply(replace_first)
Out[144]:
A B C D
CATEGORY DATE
A 2000-01-01 99.000000 99.000000 99.000000 99.000000
2000-01-03 99.000000 99.000000 99.000000 99.000000
2000-01-05 0.458031 1.959409 0.622295 0.959019
2000-01-07 0.934521 -2.016685 1.046456 1.489070
B 2000-01-02 99.000000 99.000000 99.000000 99.000000
2000-01-04 99.000000 99.000000 99.000000 99.000000
2000-01-06 -0.117322 -1.664436 1.582124 0.486796
2000-01-08 -0.225379 0.794846 -0.021214 -0.510768
来源:https://stackoverflow.com/questions/24804832/how-to-update-value-in-first-n-rows-by-group-in-a-multi-index-pandas-dataframe