Pandas - Split dataframe into multiple dataframes based on dates?

前端 未结 2 1634
北荒
北荒 2020-12-10 08:26

I have a dataframe with multiple columns along with a date column. The date format is 12/31/15 and I have set it as a datetime object.

I set the datetime column as t

相关标签:
2条回答
  • 2020-12-10 09:02

    This is a split per year.

    import pandas as pd
    import dateutil.parser
    dfile = 'rg_unificado.csv'
    df = pd.read_csv(dfile, sep='|', quotechar='"', encoding='latin-1')
    df['FECHA'] = df['FECHA'].apply(lambda x: dateutil.parser.parse(x)) 
    #http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
    #use to_period
    per = df['FECHA'].dt.to_period("Y")
    #group by that period
    agg = df.groupby([per])
    for year, group in agg:
        #this simple save the data
        datep =  str(year).replace('-', '')
        filename = '%s_%s.csv' % (dfile.replace('.csv', ''), datep)
        group.to_csv(filename, sep='|', quotechar='"', encoding='latin-1', index=False, header=True)
    
    0 讨论(0)
  • 2020-12-10 09:11

    If you must loop, you need to unpack the key and the dataframe when you iterate over a groupby object:

    import pandas as pd
    import numpy as np
    import statsmodels.api as sm
    from patsy import dmatrices
    
    df = pd.read_csv('data.csv')
    df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
    df = df.set_index('date')
    

    Note the use of group_name here:

    for group_name, df_group in df.groupby(pd.Grouper(freq='M')):
        y,X = dmatrices('value1 ~ value2 + value3', data=df_group,      
        return_type='dataframe')
    

    If you want to avoid iteration, do have a look at the notebook in Paul H's gist (see his comment), but a simple example of using apply would be:

    def do_regression(df_group, ret='outcome'):
        """Apply the function to each group in the data and return one result."""
        y,X = dmatrices('value1 ~ value2 + value3',
                        data=df_group,      
                        return_type='dataframe')
        if ret == 'outcome':
            return y
        else:
            return X
    
    outcome = df.groupby(pd.Grouper(freq='M')).apply(do_regression, ret='outcome')
    
    0 讨论(0)
提交回复
热议问题