Converting a Pandas GroupBy output from Series to DataFrame

后端 未结 9 681
广开言路
广开言路 2020-11-22 09:58

I\'m starting with input data like this

df1 = pandas.DataFrame( { 
    \"Name\" : [\"Alice\", \"Bob\", \"Mallory\", \"Mallory\", \"Bob\" , \"Mallory\"] , 
           


        
9条回答
  •  悲&欢浪女
    2020-11-22 10:38

    I want to slightly change the answer given by Wes, because version 0.16.2 requires as_index=False. If you don't set it, you get an empty dataframe.

    Source:

    Aggregation functions will not return the groups that you are aggregating over if they are named columns, when as_index=True, the default. The grouped columns will be the indices of the returned object.

    Passing as_index=False will return the groups that you are aggregating over, if they are named columns.

    Aggregating functions are ones that reduce the dimension of the returned objects, for example: mean, sum, size, count, std, var, sem, describe, first, last, nth, min, max. This is what happens when you do for example DataFrame.sum() and get back a Series.

    nth can act as a reducer or a filter, see here.

    import pandas as pd
    
    df1 = pd.DataFrame({"Name":["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"],
                        "City":["Seattle","Seattle","Portland","Seattle","Seattle","Portland"]})
    print df1
    #
    #       City     Name
    #0   Seattle    Alice
    #1   Seattle      Bob
    #2  Portland  Mallory
    #3   Seattle  Mallory
    #4   Seattle      Bob
    #5  Portland  Mallory
    #
    g1 = df1.groupby(["Name", "City"], as_index=False).count()
    print g1
    #
    #                  City  Name
    #Name    City
    #Alice   Seattle      1     1
    #Bob     Seattle      2     2
    #Mallory Portland     2     2
    #        Seattle      1     1
    #
    

    EDIT:

    In version 0.17.1 and later you can use subset in count and reset_index with parameter name in size:

    print df1.groupby(["Name", "City"], as_index=False ).count()
    #IndexError: list index out of range
    
    print df1.groupby(["Name", "City"]).count()
    #Empty DataFrame
    #Columns: []
    #Index: [(Alice, Seattle), (Bob, Seattle), (Mallory, Portland), (Mallory, Seattle)]
    
    print df1.groupby(["Name", "City"])[['Name','City']].count()
    #                  Name  City
    #Name    City                
    #Alice   Seattle      1     1
    #Bob     Seattle      2     2
    #Mallory Portland     2     2
    #        Seattle      1     1
    
    print df1.groupby(["Name", "City"]).size().reset_index(name='count')
    #      Name      City  count
    #0    Alice   Seattle      1
    #1      Bob   Seattle      2
    #2  Mallory  Portland      2
    #3  Mallory   Seattle      1
    

    The difference between count and size is that size counts NaN values while count does not.

提交回复
热议问题