How to Update Value in First N Rows by Group in a Multi-Index Pandas Dataframe?

拜拜、爱过 提交于 2019-12-02 06:47:44

问题


I am attempting to update the first N rows in a multi-index dataframe but was having a bit of trouble finding a solution so thought I'd create a post for it.

The example code is as follows:

# Imports
import numpy as np
import pandas as pd

# Set Up Data Frame
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D'])
df['DATE'] = dates
df['CATEGORY'] = ['A','B','A','B','A','B','A','B']

# Set Index
df.set_index(['CATEGORY','DATE'],inplace=True)
df.sort(inplace=True)

# Get First Two Rows of Each Category
df.groupby(level=0).apply(lambda x: x.iloc[0:2])

# Set The Value of Column 'C' Equal to Zero
# ???

So I was able to get as far as selecting the rows using "iloc", but after that I'm not sure how to set column "C" equal to zero. Feels like maybe I'm going about this the wrong way though. Any help would be greatly appreciated. Thanks!


回答1:


How about this - first define a function that takes a dataframe, and replaces the first x records with a specified value.

def replace_first_x(group_df, x, value):
    group_df.iloc[:x, :] = value
    return group_df

Then, pass that into the groupby object with apply.

In [97]: df.groupby(level=0).apply(lambda df: replace_first_x(df, 2, 9999))
Out[97]: 
                               A            B            C            D
CATEGORY DATE                                                          
A        2000-01-01  9999.000000  9999.000000  9999.000000  9999.000000
         2000-01-03  9999.000000  9999.000000  9999.000000  9999.000000
         2000-01-05     1.590503     0.948911    -0.268071     0.622280
         2000-01-07    -0.493866     1.222231     0.125037     0.071064
B        2000-01-02  9999.000000  9999.000000  9999.000000  9999.000000
         2000-01-04  9999.000000  9999.000000  9999.000000  9999.000000
         2000-01-06     1.663430    -1.170716     2.044815    -2.081035
         2000-01-08     1.593104     0.108531    -1.381218    -0.517312



回答2:


Typically, whenever you have to change values, rather then just pick them, you cannot proceed using a lambda function only, since these only allow selection.

A very boiled down way to proceed is

def replace_first(group):
    group.iloc[0:2] = 99
    return group

and then just do

In[144]: df.groupby(level=0).apply(replace_first)
Out[144]: 
                             A          B          C          D
CATEGORY DATE                                                  
A        2000-01-01  99.000000  99.000000  99.000000  99.000000
         2000-01-03  99.000000  99.000000  99.000000  99.000000
         2000-01-05   0.458031   1.959409   0.622295   0.959019
         2000-01-07   0.934521  -2.016685   1.046456   1.489070
B        2000-01-02  99.000000  99.000000  99.000000  99.000000
         2000-01-04  99.000000  99.000000  99.000000  99.000000
         2000-01-06  -0.117322  -1.664436   1.582124   0.486796
         2000-01-08  -0.225379   0.794846  -0.021214  -0.510768


来源:https://stackoverflow.com/questions/24804832/how-to-update-value-in-first-n-rows-by-group-in-a-multi-index-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!