linear-regression | 易学教程

Linear Regression with Python numpy

阅读更多关于 Linear Regression with Python numpy

问题 I'm trying to make a simple linear regression function but continue to encounter a numpy.linalg.linalg.LinAlgError: Singular matrix error Existing function (with debug prints): def makeLLS(inputData, targetData): print "In makeLLS:" print " Shape inputData:",inputData.shape print " Shape targetData:",targetData.shape term1 = np.dot(inputData.T, inputData) term2 = np.dot(inputData.T, targetData) print " Shape term1:",term1.shape print " Shape term2:",term2.shape #print term1 #print term2

how to get the slope of a linear regression line using c++?

阅读更多关于 how to get the slope of a linear regression line using c++?

问题 I need to attain the slop of a linear regression similar to the way the excel function in the below link is implemented: http://office.microsoft.com/en-gb/excel-help/slope-function-HP010342903.aspx Is there a library in C++ or a simple coded solution someone has created which can do this? I have implemented code according to this formula, however it does not always give me the correct results (taken from here http://easycalculation.com/statistics/learn-regression.php) .... Slope(b) = (NΣXY -

Interpreting Alias table testing multicollinearity of model in R

阅读更多关于 Interpreting Alias table testing multicollinearity of model in R

问题 Could someone help me interpret the alias function output for testing for multicollinearity in a multiple regression model. I know some predictor variables in my model are highly correlated, and I want to identify them using the alias table. Model : Score ~ Comments + Pros + Cons + Advice + Response + Value + Recommendation + 6Months + 12Months + 2Years + 3Years + Daily + Weekly + Monthly Complete : (Intercept) Comments Pros Cons Advice Response Value1 UseMonthly1 0 0 0 0 0 0 0

Equations for 2 variable Linear Regression

阅读更多关于 Equations for 2 variable Linear Regression

问题 We are using a programming language that does not have a linear regression function in it. We have already implemented a single variable linear equation: y = Ax + B and have simply calculated the A and B coefficents from the data using a solution similar to this Stack Overflow answer. I know this problem gets geometrically harder as variables are added, but for our purposes, we only need to add one more: z = Ax + By + C Does anyone have the closed form equations, or code in any language that

Pandas/Statsmodel OLS predicting future values

阅读更多关于 Pandas/Statsmodel OLS predicting future values

问题 I've been trying to get a prediction for future values in a model I've created. I have tried both OLS in pandas and statsmodels. Here is what I have in statsmodels: import statsmodels.api as sm endog = pd.DataFrame(dframe['monthly_data_smoothed8']) smresults = sm.OLS(dframe['monthly_data_smoothed8'], dframe['date_delta']).fit() sm_pred = smresults.predict(endog) sm_pred The length of the array returned is equal to the number of records in my original dataframe but the values are not the same.

python linear regression predict by date

阅读更多关于 python linear regression predict by date

问题 I want to predict a value at a date in the future with simple linear regression, but I can't due to the date format. This is the dataframe I have: data_df = date value 2016-01-15 1555 2016-01-16 1678 2016-01-17 1789 ... y = np.asarray(data_df['value']) X = data_df[['date']] X_train, X_test, y_train, y_test = train_test_split (X,y,train_size=.7,random_state=42) model = LinearRegression() #create linear regression object model.fit(X_train, y_train) #train model on train data model.score(X_train

how to check for correlation among continuous and categorical variables in python?

阅读更多关于 how to check for correlation among continuous and categorical variables in python?

问题 I have a dataset including categorical variables(binary) and continuous variables. I'm trying to apply a linear regression model for predicting a continuous variable. Can someone please let me know how to check for correlation among the categorical variables and the continuous target variable. Current Code: import pandas as pd df_hosp = pd.read_csv('C:\Users\LAPPY-2\Desktop\LengthOfStay.csv') data = df_hosp[['lengthofstay', 'male', 'female', 'dialysisrenalendstage', 'asthma', \ 'irondef',

What is the most accurate method in python for computing the minimum norm solution or the solution obtained from the pseudo-inverse?

阅读更多关于 What is the most accurate method in python for computing the minimum norm solution or the solution obtained from the pseudo-inverse?

问题 My goal is to solve: Kc=y with the pseudo-inverse (i.e. minimum norm solution ): c=K^{+}y such that the model is (hopefully) high degree polynomial model f(x) = sum_i c_i x^i . I am specially interested in the underdetermined case where we have more polynomial features than data (few equation too many variables/unknowns) columns = deg+1 > N = rows . Note K is the vandermode matrix of polynomial features. I was initially using the python function np.linalg.pinv but then I noticed something

Running multiple, simple linear regressions from dataframe in R

阅读更多关于 Running multiple, simple linear regressions from dataframe in R

问题 I have a dataset (data frame) with 5 columns all containing numeric values. I'm looking to run a simple linear regression for each pair in the dataset. For example, If the columns were named A, B, C, D, E , I want to run lm(A~B), lm(A~C), lm(A~D), ...., lm(D~E) ,... and, then I want to plot the data for each pair along with the regression line. I'm pretty new to R so I'm sort of spinning my wheels on how to actually accomplish this. Should I use ddply ? or lapply ? I'm not really sure how to

Running multiple, simple linear regressions from dataframe in R

阅读更多关于 Running multiple, simple linear regressions from dataframe in R