pandas: Filling missing values within a group

前端未结

关注

 2  1761

I have some data from an experiment, and within each trial there are some single values, surrounded by NA\'s, that I want to fill out to the entire trial:

相关标签:

2条回答

遇见更好的自我

2020-12-15 09:48

If you want to avoid the error that appears when some groups contain only NaN you could do the following (Note that I changed the df so there are only Nan for the group having trial=1):

df = pd.DataFrame({'trial': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3,1,1], 
'cs_name': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 'B2', np.nan, 
'A3', np.nan, np.nan, np.nan, np.nan,np.nan]})

g = data.groupby('trial')

g['cs_name'].transform(lambda s: 'No values to aggregate' if 
    pd.isnull(s).all() == True else s.loc[s.first_valid_index()])

df['cs_name'] = g['cs_name'].transform(lambda s: 'No values to aggregate' if 
    pd.isnull(s).all() == True else s.loc[s.first_valid_index()])`

This way you input 'No Values to aggregate' (or whatever you want) when the program finds all NaN for a particular group, instead of an error.

Hope this helps :)

Federico

0 讨论(0)

盖世英雄少女心

2020-12-15 09:50
An alternative approach is to use first_valid_index and a transform:
```
In [11]: g = df.groupby('trial')

In [12]: g['cs_name'].transform(lambda s: s.loc[s.first_valid_index()])
Out[12]: 
0     A1
1     A1
2     A1
3     A1
4     B2
5     B2
6     B2
7     B2
8     A1
9     A1
10    A1
11    A1
Name: cs_name, dtype: object
```
This ought to be more efficient then using ffill followed by a bfill...

And use this to change the cs_name column:
```
df['cs_name'] = g['cs_name'].transform(lambda s: s.loc[s.first_valid_index()])
```
Note: I think it would be nice enhancement to have a method to grab the first non-null object in the pandas, in numpy it's an open request, I don't think there is currently a method (I could be wrong!)...
0 讨论(0)
发布评论:

提交评论
- 加载中...