Regressing periodic data with sklearn

流过昼夜 提交于 2019-12-13 05:35:46

问题


I have a dataset with a Regression problem. Earlier i thought it is a linear regression problem but when i plotted "date_time" against "traffic_volume" then it turned out be something like a Sine curve so i decided to go for "Curve Fitting". Here's the code:

import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np
import datetime as dt
from sklearn.linear_model import LinearRegression
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn import metrics
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures
from scipy.optimize import leastsq
#import matplotlib.pyplot as plt
import pylab as plt
from scipy.optimize import curve_fit

df = pd.read_csv("Metro_Interstate_Traffic_Volume.csv")
df['holiday'].replace(to_replace = 'None', value = '0', inplace=True)
df.loc[df['holiday'] != '0', 'holiday'] = 1
print(df.shape)

df['date_time'] =  pd.to_datetime(df['date_time'], format='%m/%d/%Y %H:%M')
df['date_time'] = (df['date_time']- dt.datetime(1970,1,1)).dt.total_seconds()

#print(df['date_time'].head())

non_dummy_cols = ['holiday','temp','rain_1h', 'snow_1h', 'clouds_all','date_time', 'traffic_volume'] 

dummy_cols = list(set(df.columns) - set(non_dummy_cols))
df = pd.get_dummies(df, columns=dummy_cols)
print(df.shape)

x = df[df.columns.values]
x = x.drop(['traffic_volume'], axis=1)
x = x.drop(['clouds_all'], axis = 1)
y = df['traffic_volume']
print(x.shape)
print(y.shape)

#plt.figure(figsize=(6,4))
#plt.scatter(df.date_time[0:100], df.traffic_volume[0:100], color = 'blue')
#plt.xlabel("Date Time")
#plt.ylabel("Traffic volume")
#plt.show()

x = StandardScaler().fit_transform(x)

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.2, random_state= 4)

def my_sin(x, freq, amplitude, phase, offset):
    return np.sin(x * freq + phase) * amplitude + offset

#x_train = np.array(x_train)
#y_train = np.array(y_train)

print(x_train)

popt, pcov = curve_fit(my_sin, x_train, y_train)
y_hat = my_sin(x_test, *popt)

Now the issue with this way is the following error:

ValueError: operands could not be broadcast together with shapes (38563,54) (38563,) 

I know that the error is due to x_train.shape as it of m*n while the curve_fit only accepts m, . As when i tried to train the model with only one feature in x_train rather than 53 then curve_fit model worked but it turned out to be a horribly trained model. Here is the dataset link:

Dataset: Download

For a quick view here's the dataset first few rows image:

So please help me to train this model, if can suggest any algorithm which can train this model? What features i should drop or use all of them? I have also tried by fitting this model using polynomial regression of degree 2 as while using degree 3 my pc crashed several times. SO please help me out.

Note : I have re-asked this question as per said by one community member as the previous one was only related with the curve_fit error and was not much clear from the title.

来源:https://stackoverflow.com/questions/57027040/regressing-periodic-data-with-sklearn

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!