linear-regression

ValueError: negative dimensions are not allowed in scikit linear regression CV model with sparse matrices

我的梦境 提交于 2019-12-08 08:39:13
问题 I recently competed in a kaggle competition and ran into problems trying to run linear CV models from scikit learn. I am aware of a similar question on stack overflow but I can't see how the accepted reply relates to my issue. Any assistance would be greatly appreciated. My code is given below: train=pd.read_csv(".../train.csv") test=pd.read_csv(".../test.csv") data=pd.read_csv(".../sampleSubmission.csv") from sklearn.feature_extraction.text import TfidfVectorizer transformer =

R Variable Length Differ when build linear model for residuals

╄→尐↘猪︶ㄣ 提交于 2019-12-08 08:15:12
问题 I am working on a problem where I want to build a linear model using residuals of two other linear models. I have used UN3 data set to show my problem since its easy put the problem here than using my actual data set. Here is my R code: head(UN3) m1.lgFert.purban <- lm(log(Fertility) ~ Purban, data=UN3) m2.lgPPgdp.purban <- lm(log(PPgdp) ~ Purban, data=UN3) m3 <- lm(residuals(m1.lgFert.purban) ~ residuals(m2.lgPPgdp.purban)) Here is the error I am getting: > m3 <- lm(residuals(m1.lgFert

Statsmodels.formula.api OLS does not show statistical values of intercept

断了今生、忘了曾经 提交于 2019-12-08 07:10:43
问题 I am running the following source code: import statsmodels.formula.api as sm # Add one column of ones for the intercept term X = np.append(arr= np.ones((50, 1)).astype(int), values=X, axis=1) regressor_OLS = sm.OLS(endog=y, exog=X).fit() print(regressor_OLS.summary()) where X is an 50x5 (before adding the intercept term) numpy array which looks like this: [[0 1 165349.20 136897.80 471784.10] [0 0 162597.70 151377.59 443898.53]...] and y is a a 50x1 numpy array with float values for the

Fitting Markov Switching Models to data in R

[亡魂溺海] 提交于 2019-12-08 06:43:08
问题 I'm trying to fit two kinds of Markov Switching Models to a time series of log-returns using the package MSwM in R. The models I'm considering are a regression model with only an intercept, and an AR(1) model. Here is the code I'm using: library(tseries) #Prices ftse<-get.hist.quote(instrument="^FTSE", start="1984-01-03", end="2014-01-01", quote="AdjClose", compression="m") #Log-returns ftse.ret<-diff(log(ftse)) library(MSwM) #Model with only intercept mod<-lm(ftse.ret ~ 1) #Fit regime

Out of memory when using `outer` in solving my big normal equation for least squares estimation

主宰稳场 提交于 2019-12-08 06:11:16
问题 Consider the following example in R: x1 <- rnorm(100000) x2 <- rnorm(100000) g <- cbind(x1, x2, x1^2, x2^2) gg <- t(g) %*% g gginv <- solve(gg) bigmatrix <- outer(x1, x2, "<=") Gw <- t(g) %*% bigmatrix beta <- gginv %*% Gw w1 <- bigmatrix - g %*% beta If I try to run such a thing in my computer, it will throw a memory error (because the bigmatrix is too big). Do you know how can I achieve the same, without running into this problem? 回答1: This is a least squares problem with 100,000 responses.

Estimating linear regression with Gradient Descent (Steepest Descent)

人走茶凉 提交于 2019-12-08 04:02:19
问题 Example data X<-matrix(c(rep(1,97),runif(97)) , nrow=97, ncol=2) y<-matrix(runif(97), nrow= 97 , ncol =1) I have succeed in creating the cost function COST<-function(theta,X,y){ ### Calculate half MSE sum((X %*% theta - y)^2)/(2*length(y)) } How ever when I run this function , it seem to fail to converge over 100 iterations. theta <- matrix (0, nrow=2,ncol=1) num.iters <- 1500 delta = 0 GD<-function(X,y,theta,alpha,num.iters){ for (i in num.iters){ while (max(abs(delta)) < tolerance){ error <

Loop multiple 'multiple linear regressions' in R

主宰稳场 提交于 2019-12-08 03:22:27
I have a database where I want to do several multiple regressions. They all look like this: fit <- lm(Variable1 ~ Age + Speed + Gender + Mass, data=Data) The only variable changing is variable1. Now I want to loop or use something from the apply family to loop several variables at the place of variable1. These variables are columns in my datafile. Can someone help me to solve this problem? Many thanks! what I tried so far: When I extract one of the column names with the names() function I do get a the name of the column: varname = as.name(names(Data[14])) But when I fill this in (and I used

Doing linear prediction with R: How to access the predicted parameter(s)?

无人久伴 提交于 2019-12-08 02:17:11
问题 I am new to R and I am trying to do linear prediction. Here is some simple data: test.frame<-data.frame(year=8:11, value= c(12050,15292,23907,33991)) Say if I want to predict the value for year=12 . This is what I am doing (experimenting with different commands): lma=lm(test.frame$value~test.frame$year) # let's get a linear fit summary(lma) # let's see some parameters attributes(lma) # let's see what parameters we can call lma$coefficients # I get the intercept and gradient predict(lm(test

How to choose Gaussian basis functions hyperparameters for linear regression?

↘锁芯ラ 提交于 2019-12-08 01:26:28
问题 I'm quite new in machine learning environment, and I'm trying to understand properly some basis concept. My problem is the following: I have a set of data observation and the corresponding target values { x , t }. I'm trying to train a function with this data in order to predict the value of unobserved data and I'm trying to achieve this by using the maximum posterior (MAP) technique (and so Bayesian approach) with Gaussian basis function of the form: \{Phi}Gaussian_{j}(x)=exp((x−μ_{j})^2/2

Weighted Non-negative Least Square Linear Regression in python [closed]

老子叫甜甜 提交于 2019-12-07 23:54:17
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . I know there is an weighted OLS solver, and a constrained OLS solver. Is there a routine that combines the two? 回答1: You can simulate OLS weighting by modifying the X and y inputs. In OLS, you solve β for X t X β = X t y . In Weighted OLS, you solve X t X W β = X t W y . where W is a diagonal matrix with