Working with date types in Python Linear regression

喜你入骨 提交于 2019-12-13 20:28:27

问题


Data Set:

I have collected tablespace growth of my database and trying to use it to predict the growth.

Dataset has data from year 2009 to 2017. I tried many ways but unable to use the date format for processing. Got errors and all of them are related to date time types. Can you please suggest how i can use this dataset to predict the growth.

One of the errors:

TypeError: Cannot cast array data from dtype('M8[ns]') to dtype('float64') according to the rule 'safe'

TS_SIZE FETCH_DATE
34911.99    01-05-2009
34672.5     02-05-2009
34683.39    03-05-2009
34904.7     04-05-2009
35063.87    05-05-2009
35298.46    06-05-2009
35161.88    07-05-2009
34872.53    08-05-2009

Code

%matplotlib notebook
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
import pandas as pd
import numpy as np
from sklearn.cross_validation import train_test_split
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
import datetime

data = pd.read_excel('D:/database/2.xlsx')
X_R1 = data['FETCH_DATE'].to_frame() #DataFrame
X_R1 = np.array(X_R1).reshape((-1,1))
y_R1 = data['TS_SIZE']

X_train, X_test, y_train, y_test = train_test_split(X_R1, y_R1, test_size=0.3,random_state = 0)
linreg = LinearRegression().fit(X_train, y_train)
ytest_predict_linear = linreg.predict(X_test)

###########POLY TEST PREDICTION#################
#lr = LinearRegression()
pr = LinearRegression()
poly = PolynomialFeatures(degree = 2)
X_R1_Poly = poly.fit_transform(X_R1)

pr.fit(X_R1_Poly,y_R1)
#X_train, X_test, y_train, y_test =    train_test_split(X_R1_Poly,y_R1,random_state=0)
ytest_predict_quadratic = pr.predict(poly.fit_transform(X_test))
#linreg = Ridge().fit(X_train,y_train)
#print("Predicted Quadratic: {}" .format(ytest_predict_quadratic))
#plt.figure(figsize=(5,4))
#plt.scatter(X_R1,y_R1,marker= 'o', s=50, alpha=0.8,label='training points')
plt.scatter(X_R1,y_R1,marker= 'o',label='training points')
#plt.plot(X_R1, linreg.coef_ * X_R1_Poly + linreg.intercept_, 'r-')
plt.plot(X_test,ytest_predict_linear,label='linear fit',linestyle='--',color='r')
plt.plot(X_test,ytest_predict_quadratic,label='quadratic fit',color='g')
#plt.xlabel('Feature value x')
#plt.ylabel('Feature value y')
plt.legend(loc='upper left')
#plt.show()

#print('Training MSE linear: %.3f, quadratic: %.3f' % (mean_squared_error(y_R1, ytest_predict_linear),mean_squared_error(y_R1,    ytest_predict_quadratic)))
#print('Training R^2 linear: %.3f, quadratic: %.3f' % (r2_score(y_R1, ytest_predict_linear),r2_score(y_R1, ytest_predict_quadratic)))
###########POLY NEW PREDICTIONS#################

data1 = pd.read_excel('D:/database/2.xlsx')
print('Printing new dates')
print(data1['FETCH_DATE'])
X_R2_quad = pd.DataFrame(data1['FETCH_DATE'])
X_R2_quad = np.array(X_R2_quad,dtype='int64')
print(X_R2_quad)
#print("New values shape: %s" %(X_R2_quad.shape))
#print("New values: %s" %(X_R2_quad))
X_R2_quad_poly = poly.fit_transform(X_R2_quad)
#X_R2_quad_poly = linreg.fit(X_R2_quad)

ynew_predict_quadratic = pr.predict(X_R2_quad_poly)
#ynew_predict_quadratic = linreg.predict(X_R2_quad_poly)

print("Predicted Values beyond test: {}" .format(ynew_predict_quadratic))

plt.scatter(X_R2_quad, ynew_predict_quadratic,marker= '*',label='predicted values')
plt.plot(X_R2_quad,ynew_predict_quadratic,label='predicted quadratic fit')
plt.legend(loc='upper left')
plt.xlabel("Year")
plt.ylabel("Predicted Growth")
#x=plt.gca().xaxis
#for item in x.get_ticklabels():
#   item.set_rotation(45)
plt.show()

来源:https://stackoverflow.com/questions/48518471/working-with-date-types-in-python-linear-regression

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!