How to create multiple dataframes using multiple functions

前端 未结 2 1610
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-02 16:54

I quite often write a function to return different dataframes based on the parameters I enter. Here\'s an example dataframe:

np.random.seed(1111)
df = pd.Dat         


        
相关标签:
2条回答
  • 2021-01-02 17:27

    A dictionary would be my first choice:

    variations = ([('Units Sold', list_one), ('Dollars_Sold',list_two), 
                  ..., ('Title', some_list)])
    
    df_variations = {}
    
    for i, v in enumerate(variations):
         name = v[0]
         data = v[1]
         df_variations[i] = some_fun(df, name, data)
    

    You might further consider setting the keys to unique / helpful titles for the variations, that goes beyond something like 'Units Sold', which isn't unique in your case.

    0 讨论(0)
  • 2021-01-02 17:37

    IIUC,

    as Thomas has suggested we can use a dictionary to parse through your data, but with some minor modifications to your function, we can use the dictionary to hold all the required data then pass that through to your function.

    the idea is to pass two types of keys, the list of columns and the arguments to your pd.Grouper call.

    data_dict = {
        "Units_Sold": {"key": "Date", "freq": "A"},
        "Dollars_Sold": {"key": "Date", "freq": "A"},
        "col_list_1": ["Category", "Product"],
        "col_list_2": ["Category", "Sub-Category", "Sub-Category-2"],
        "col_list_3": ["Sub-Category", "Product"],
    }
    

    def some_fun(dataframe, agg_col, dictionary,column_list, *args):
    
        key = dictionary[agg_col]["key"]
    
        frequency = dictionary[agg_col]["freq"]
    
        myList = [pd.Grouper(key=key, freq=frequency), *dictionary[column_list]]
    
        y = (
            pd.concat(
                [
                    dataframe.assign(**{x: "[Total]" for x in myList[i:]})
                    .groupby(myList)
                    .agg(sumz=(agg_col, "sum"))
                    for i in range(1, len(myList) + 1)
                ]
            )
            .sort_index()
            .unstack(0)
        )
        return y
    

    Test.

    df1 = some_fun(df,'Units_Sold',data_dict,'col_list_3')
    print(df1)
                                     sumz                      
    Date                   2016-12-31 2017-12-31 2018-12-31
    Sub-Category Product                                   
    X            Product 1      18308      17839      18776
                 Product 2      18067      19309      18077
                 Product 3      17943      19121      17675
                 [Total]        54318      56269      54528
    Y            Product 1      20699      18593      18103
                 Product 2      18642      19712      17122
                 Product 3      17701      19263      20123
                 [Total]        57042      57568      55348
    Z            Product 1      19077      17401      19138
                 Product 2      17207      21434      18817
                 Product 3      18405      17300      17462
                 [Total]        54689      56135      55417
    [Total]      [Total]       166049     169972     165293
    

    as you want to automate the writing of the 10x worksheets, we can again do that with a dictionary call over your function:

    matches = {'Units_Sold': ['col_list_1','col_list_3'],
              'Dollars_Sold' : ['col_list_2']}
    

    then a simple for loop to write all the files to a single excel sheet, change this to match your required behavior.

    writer = pd.ExcelWriter('finished_excel_file.xlsx')
    for key,value in matches.items():
        for items in value:        
            dataframe = some_fun(df,k,data_dict,items)
            dataframe.to_excel(writer,f'{key}_{items}')
    writer.save()
    
    0 讨论(0)
提交回复
热议问题