Access index in pandas.Series.apply

前端 未结 6 1018
Happy的楠姐
Happy的楠姐 2020-11-30 02:21

Lets say I have a MultiIndex Series s:

>>> s
     values
a b
1 2  0.1 
3 6  0.3
4 4  0.7

and I want to apply a functi

相关标签:
6条回答
  • 2020-11-30 02:41

    You can access the whole row as argument inside the fucntion if you use DataFrame.apply() instead of Series.apply().

    def f1(row):
        if row['I'] < 0.5:
            return 0
        else:
            return 1
    
    def f2(row):
        if row['N1']==1:
            return 0
        else:
            return 1
    
    import pandas as pd
    import numpy as np
    df4 = pd.DataFrame(np.random.rand(6,1), columns=list('I'))
    df4['N1']=df4.apply(f1, axis=1)
    df4['N2']=df4.apply(f2, axis=1)
    
    0 讨论(0)
  • 2020-11-30 02:50

    Make it a frame, return scalars if you want (so the result is a series)

    Setup

    In [11]: s = Series([1,2,3],dtype='float64',index=['a','b','c'])
    
    In [12]: s
    Out[12]: 
    a    1
    b    2
    c    3
    dtype: float64
    

    Printing function

    In [13]: def f(x):
        print type(x), x
        return x
       ....: 
    
    In [14]: pd.DataFrame(s).apply(f)
    <class 'pandas.core.series.Series'> a    1
    b    2
    c    3
    Name: 0, dtype: float64
    <class 'pandas.core.series.Series'> a    1
    b    2
    c    3
    Name: 0, dtype: float64
    Out[14]: 
       0
    a  1
    b  2
    c  3
    

    Since you can return anything here, just return the scalars (access the index via the name attribute)

    In [15]: pd.DataFrame(s).apply(lambda x: 5 if x.name == 'a' else x[0] ,1)
    Out[15]: 
    a    5
    b    2
    c    3
    dtype: float64
    
    0 讨论(0)
  • 2020-11-30 02:53

    I don't believe apply has access to the index; it treats each row as a numpy object, not a Series, as you can see:

    In [27]: s.apply(lambda x: type(x))
    Out[27]: 
    a  b
    1  2    <type 'numpy.float64'>
    3  6    <type 'numpy.float64'>
    4  4    <type 'numpy.float64'>
    

    To get around this limitation, promote the indexes to columns, apply your function, and recreate a Series with the original index.

    Series(s.reset_index().apply(f, axis=1).values, index=s.index)
    

    Other approaches might use s.get_level_values, which often gets a little ugly in my opinion, or s.iterrows(), which is likely to be slower -- perhaps depending on exactly what f does.

    0 讨论(0)
  • 2020-11-30 02:54

    Convert to DataFrame and apply along row. You can access the index as x.name. x is also a Series now with 1 value

    s.to_frame(0).apply(f, axis=1)[0]
    
    0 讨论(0)
  • 2020-11-30 02:54

    Use reset_index() to convert the Series to a DataFrame and the index to a column, and then apply your function to the DataFrame.

    The tricky part is knowing how reset_index() names the columns, so here are a couple of examples.

    With a Singly Indexed Series

    s=pd.Series({'idx1': 'val1', 'idx2': 'val2'})
    
    def use_index_and_value(row):
        return 'I made this with index {} and value {}'.format(row['index'], row[0])
    
    s2 = s.reset_index().apply(use_index_and_value, axis=1)
    
    # The new Series has an auto-index;
    # You'll want to replace that with the index from the original Series
    s2.index = s.index
    s2
    

    Output:

    idx1    I made this with index idx1 and value val1
    idx2    I made this with index idx2 and value val2
    dtype: object
    

    With a Multi-Indexed Series

    Same concept here, but you'll need to access the index values as row['level_*'] because that's where they're placed by Series.reset_index().

    s=pd.Series({
        ('idx(0,0)', 'idx(0,1)'): 'val1',
        ('idx(1,0)', 'idx(1,1)'): 'val2'
    })
    
    def use_index_and_value(row):
        return 'made with index: {},{} & value: {}'.format(
            row['level_0'],
            row['level_1'],
            row[0]
        )
    
    s2 = s.reset_index().apply(use_index_and_value, axis=1)
    
    # Replace auto index with the index from the original Series
    s2.index = s.index
    s2
    

    Output:

    idx(0,0)  idx(0,1)    made with index: idx(0,0),idx(0,1) & value: val1
    idx(1,0)  idx(1,1)    made with index: idx(1,0),idx(1,1) & value: val2
    dtype: object
    

    If your series or indexes have names, you will need to adjust accordingly.

    0 讨论(0)
  • 2020-11-30 02:55

    You may find it faster to use where rather than apply here:

    In [11]: s = pd.Series([1., 2., 3.], index=['a' ,'b', 'c'])
    
    In [12]: s.where(s.index != 'a', 5)
    Out[12]: 
    a    5
    b    2
    c    3
    dtype: float64
    

    Also you can use numpy-style logic/functions to any of the parts:

    In [13]: (2 * s + 1).where((s.index == 'b') | (s.index == 'c'), -s)
    Out[13]: 
    a   -1
    b    5
    c    7
    dtype: float64
    
    In [14]: (2 * s + 1).where(s.index != 'a', -s)
    Out[14]: 
    a   -1
    b    5
    c    7
    dtype: float64
    

    I recommend testing for speed (as efficiency against apply will depend on the function). Although, I find that applys are more readable...

    0 讨论(0)
提交回复
热议问题