问题
Hi I have a linear regression model that i am trying to optimise. I am optimising the span of an exponential moving average and the number of lagged variables that I use in the regression.
However I keep finding that the results and the calculated mse keep coming up with different final results. No idea why can anyone help?
Process after starting loop: 1. Create new dataframe with three variables 2. Remove nil values 3. Create ewma's for each variable 4. Create lags for each variable 5. Drop NA's 6. Create X,y 7. Regress and save ema span and lag number if better MSE 8. start loop with next values
I know that this could be a question for cross validated but since it could be a programmatic I have posted here:
bestema = 0
bestlag = 0
mse = 1000000
for e in range(2, 30):
for lags in range(1, 20):
df2 = df[['diffbn','diffbl','diffbz']]
df2 = df2[(df2 != 0).all(1)]
df2['emabn'] = pd.ewma(df2.diffbn, span=e)
df2['emabl'] = pd.ewma(df2.diffbl, span=e)
df2['emabz'] = pd.ewma(df2.diffbz, span=e)
for i in range(0,lags):
df2["lagbn%s" % str(i+1)] = df2["emabn"].shift(i+1)
df2["lagbz%s" % str(i+1)] = df2["emabz"].shift(i+1)
df2["lagbl%s" % str(i+1)] = df2["emabl"].shift(i+1)
df2 = df2.dropna()
b = list(df2)
#print(a)
b.remove('diffbl')
b.remove('emabn')
b.remove('emabz')
b.remove('emabl')
b.remove('diffbn')
b.remove('diffbz')
X = df2[b]
y = df2["diffbl"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
#print(X_train.shape)
regr = linear_model.LinearRegression()
regr.fit(X_train, y_train)
if(mean_squared_error(y_test,regr.predict(X_test)) < mse):
mse = mean_squared_error(y_test,regr.predict(X_test) ** 2)
#mse = mean_squared_error(y_test,regr.predict(X_test))
bestema = e
bestlag = lags
print(regr.coef_)
print(bestema)
print(bestlag)
print(mse)
回答1:
The train_test_split
function from sklearn (see docs: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html) is random, so it is logical you get different results each time.
You can pass an argument to the random_state
keyword to have it the same each time.
来源:https://stackoverflow.com/questions/28673442/getting-different-result-each-time-i-run-a-linear-regression-using-scikit