I have some data from an experiment, and within each trial there are some single values, surrounded by NA
\'s, that I want to fill out to the entire trial:
If you want to avoid the error that appears when some groups contain only NaN you could do the following (Note that I changed the df so there are only Nan for the group having trial=1):
df = pd.DataFrame({'trial': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3,1,1],
'cs_name': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 'B2', np.nan,
'A3', np.nan, np.nan, np.nan, np.nan,np.nan]})
g = data.groupby('trial')
g['cs_name'].transform(lambda s: 'No values to aggregate' if
pd.isnull(s).all() == True else s.loc[s.first_valid_index()])
df['cs_name'] = g['cs_name'].transform(lambda s: 'No values to aggregate' if
pd.isnull(s).all() == True else s.loc[s.first_valid_index()])`
This way you input 'No Values to aggregate' (or whatever you want) when the program finds all NaN for a particular group, instead of an error.
Hope this helps :)
Federico
An alternative approach is to use first_valid_index and a transform:
In [11]: g = df.groupby('trial')
In [12]: g['cs_name'].transform(lambda s: s.loc[s.first_valid_index()])
Out[12]:
0 A1
1 A1
2 A1
3 A1
4 B2
5 B2
6 B2
7 B2
8 A1
9 A1
10 A1
11 A1
Name: cs_name, dtype: object
This ought to be more efficient then using ffill followed by a bfill...
And use this to change the cs_name
column:
df['cs_name'] = g['cs_name'].transform(lambda s: s.loc[s.first_valid_index()])
Note: I think it would be nice enhancement to have a method to grab the first non-null object in the pandas, in numpy it's an open request, I don't think there is currently a method (I could be wrong!)...