How to perform time series analysis that contains multiple groups in Python using fbProphet or other models?

后端 未结 3 1659
盖世英雄少女心
盖世英雄少女心 2020-12-16 00:19

All,

My dataset looks like following. I am trying to predict the \'amount\' for next 6 months using either the fbProphet or other model. But my issue is

相关标签:
3条回答
  • 2020-12-16 00:33

    I know this is old but I was trying to predict outcomes for different clients and I tried to use Aditya Santoso solution above but got into some errors, so I added a couple of modifications and finally this worked for me:

    df = pd.read_csv('file.csv')
    df = pd.DataFrame(df)
    df = df.rename(columns={'date': 'ds', 'amount': 'y', 'client_id': 'client_id'})
    #I had to filter first clients with less than 3 records to avoid errors as prophet only works for 2+ records by group
    df = df.groupby('client_id').filter(lambda x: len(x) > 2)
    
    df.client_id = df.client_id.astype(str)
    
    final = pd.DataFrame(columns=['client','ds','yhat'])
    
    grouped = df.groupby('client_id')
    for g in grouped.groups:
        group = grouped.get_group(g)
        m = Prophet()
        m.fit(group)
        future = m.make_future_dataframe(periods=365)
        forecast = m.predict(future)
        #I added a column with client id
        forecast['client'] = g
        #I used concat instead of merge
        final = pd.concat([final, forecast], ignore_index=True)
    
    final.head(10)
    
    0 讨论(0)
  • 2020-12-16 00:35

    fbprophet requires two columns ds and y, so you need to first rename the two columns

    df = df.rename(columns={'Date': 'ds', 'Amount':'y'})
    

    Assuming that your groups are independent from each other and you want to get one prediction for each group, you can group the dataframe by "Group" column and run forecast for each group

    from fbprophet import Prophet
    grouped = df.groupby('Group')
    for g in grouped.groups:
        group = grouped.get_group(g)
        m = Prophet()
        m.fit(group)
        future = m.make_future_dataframe(periods=365)
        forecast = m.predict(future)
        print(forecast.tail())
    

    Take note that the input dataframe that you supply in the question is not sufficient for the model because group D only has a single data point. fbprophet's forecast needs at least 2 non-Nan rows.

    EDIT: if you want to merge all predictions into one dataframe, the idea is to name the yhat for each observations differently, do pd.merge() in the loop, and then cherry-pick the columns that you need at the end:

    final = pd.DataFrame()
    for g in grouped.groups:
        group = grouped.get_group(g)
        m = Prophet()
        m.fit(group)
        future = m.make_future_dataframe(periods=365)
        forecast = m.predict(future)    
        forecast = forecast.rename(columns={'yhat': 'yhat_'+g})
        final = pd.merge(final, forecast.set_index('ds'), how='outer', left_index=True, right_index=True)
    
    final = final[['yhat_' + g for g in grouped.groups.keys()]]
    
    0 讨论(0)
  • 2020-12-16 00:45
    import pandas as pd
    import numpy as np
    from statsmodels.tsa.statespace.sarimax import SARIMAX
    from statsmodels.tsa.arima_model import ARIMA
    from statsmodels.tsa.stattools import adfuller
    from matplotlib import pyplot as plt
    from sklearn.metrics import mean_squared_error
    from sklearn.metrics import mean_squared_log_error  
    
    
    
    # Before doing any modeling using ARIMA or SARIMAS etc Confirm that
    # your time-series is stationary by using Augmented Dick Fuller test
    # or other tests.
    
    # Create a list of all groups or get from Data using np.unique or other methods
    groups_iter = ['A', 'B', 'C', 'D']
    
    dict_org = {}
    dict_pred = {}
    group_accuracy = {}
    
    # Iterate over all groups and get data 
    # from Dataframe by filtering for specific group
    for i in range(len(groups_iter)):
        X = data[data['Group'] == groups_iter[i]]['Amount'].values
        size = int(len(X) * 0.70)
        train, test = X[0:size], X[size:len(X)]
        history = [x for in train]
    
        # Using ARIMA model here you can also do grid search for best parameters
        for t in range(len(test)):
            model = ARIMA(history, order = (5, 1, 0))
            model_fit = model.fit(disp = 0)
            output = model_fit.forecast()
            yhat = output[0]
            predictions.append(yhat)
            obs = test[t]
            history.append(obs)
            print("Predicted:%f, expected:%f" %(yhat, obs))
        error = mean_squared_log_error(test, predictions)
        dict_org.update({groups_iter[i]: test})
        dict_pred.update({group_iter[i]: test})
    
        print("Group: ", group_iter[i], "Test MSE:%f"% error)
        group_accuracy.update({group_iter[i]: error})
        plt.plot(test)
        plt.plot(predictions, color = 'red')
        plt.show()
    
    0 讨论(0)
提交回复
热议问题