Pandas how to apply multiple functions to dataframe

前端 未结 4 2181
粉色の甜心
粉色の甜心 2020-12-03 13:39

Is there a way to apply a list of functions to each column in a DataFrame like the DataFrameGroupBy.agg function does? I found an ugly way to do it like this:



        
相关标签:
4条回答
  • 2020-12-03 14:15

    I tried to apply three functions into a column and it works

    #removing new line character
    rem_newline = lambda x : re.sub('\n',' ',x).strip()
    
    #character lower and removing spaces
    lower_strip = lambda x : x.lower().strip()
    
    df = df['users_name'].apply(lower_strip).apply(rem_newline).str.split('(',n=1,expand=True)
    
    0 讨论(0)
  • 2020-12-03 14:20

    For Pandas 0.20.0 or newer, use df.agg (thanks to ayhan for pointing this out):

    In [11]: df.agg(['mean', 'std'])
    Out[11]: 
               one       two
    mean  5.147471  4.964100
    std   2.971106  2.753578
    

    For older versions, you could use

    In [61]: df.groupby(lambda idx: 0).agg(['mean','std'])
    Out[61]: 
            one               two          
           mean       std    mean       std
    0  5.147471  2.971106  4.9641  2.753578
    

    Another way would be:

    In [68]: pd.DataFrame({col: [getattr(df[col], func)() for func in ('mean', 'std')] for col in df}, index=('mean', 'std'))
    Out[68]: 
               one       two
    mean  5.147471  4.964100
    std   2.971106  2.753578
    
    0 讨论(0)
  • 2020-12-03 14:30

    I am using pandas to analyze Chilean legislation drafts. In my dataframe, the list of authors are stored as a string. The answer above did not work for me (using pandas 0.20.3). So I used my own logic and came up with this:

    df.authors.apply(eval).apply(len).sum()
    

    Concatenated applies! A pipeline!! The first apply transforms

    "['Barros Montero: Ramón', 'Bellolio Avaria: Jaime', 'Gahona Salazar: Sergio']"
    

    into the obvious list, the second apply counts the number of lawmakers involved in the project. I want the size of every pair (lawmaker, project number) (so I can presize an array where I will study which parties work on what).

    Interestingly, this works! Even more interestingly, that last call fails if one gets too ambitious and does this instead:

    df.autores.apply(eval).apply(len).apply(sum)
    

    with an error:

    TypeError: 'int' object is not iterable
    

    coming from deep within /site-packages/pandas/core/series.py in apply

    0 讨论(0)
  • 2020-12-03 14:36

    In the general case where you have arbitrary functions and column names, you could do this:

    df.apply(lambda r: pd.Series({'mean': r.mean(), 'std': r.std()})).transpose()
    
             mean       std
    one  5.366303  2.612738
    two  4.858691  2.986567
    
    0 讨论(0)
提交回复
热议问题