regression

Dummy Coding of Nominal Attributes - Effect of Using K Dummies, Effect of Attribute Selection

荒凉一梦 提交于 2020-01-02 18:04:43
问题 Summing up my understanding of the topic 'Dummy Coding' is usually understood as coding a nominal attribute with K possible values as K-1 binary dummies. The usage of K values would cause redundancy and would have a negative impact e.g. on logistic regression, as far as I learned it. That far, everything's clear to me. Yet, two issues are unclear to me: 1) Bearing in mind the issue stated above, I am confused that the 'Logistic' classifier in WEKA actually uses K dummies (see picture). Why

Error in scale.default: length of 'center' must equal the number of columns of 'x'

爱⌒轻易说出口 提交于 2020-01-02 06:48:12
问题 I am using mboost package to do some classification. Here is the code library('mboost') load('so-data.rdata') model <- glmboost(is_exciting~., data=training, family=Binomial()) pred <- predict(model, newdata=test, type="response") But R complains when doing prediction that Error in scale.default(X, center = cm, scale = FALSE) : length of 'center' must equal the number of columns of 'x' The data ( training and test ) can be downloaded here (7z, zip). What is the reason of the error and how to

repeated measure anova using regression models (LM, LMER)

被刻印的时光 ゝ 提交于 2020-01-01 10:14:13
问题 I would like to run repeated measure anova in R using regression models instead an 'Analysis of Variance' ( AOV ) function. Here is an example of my AOV code for 3 within-subject factors: m.aov<-aov(measure~(task*region*actiontype) + Error(subject/(task*region*actiontype)),data) Can someone give me the exact syntax to run the same analysis using regression models? I want to make sure to respect the independence of residuals, i.e. use specific error terms as with AOV. In a previous post I read

Scikit-learn Ridge Regression with unregularized intercept term

一笑奈何 提交于 2020-01-01 09:19:00
问题 Does the scikit-learn Ridge regression include the intercept coefficient in the regularization term, and if so, is there a way to run ridge regression without regularizing the intercept? Suppose I fit a Ridge Regression: from sklearn import linear_model mymodel = linear_model.Ridge(alpha=0.1, fit_intercept=True).fit(X, y) print mymodel.coef_ print mymodel.intercept_ for some data X, y where X does not include a column of 1's. fit_intercept=True will automatically add an intercept column, and

mgcv gam() error: model has more coefficients than data

末鹿安然 提交于 2020-01-01 07:13:43
问题 I am using GAM (generalized additive models) for my dataset. This dataset has 32 observations, with 6 predictor variables and a response variable (namely power). I am using gam() function of the mgcv package to fit the models. Whenever, I try to fit a model I do get an error message as: Error in gam(formula.hh, data = data, na.action = na.exclude, : Model has more coefficients than data From this error message, I infer that I have more predictor variables as compared to the number of

Calculate cross validation for Generalized Linear Model in Matlab

妖精的绣舞 提交于 2020-01-01 06:48:25
问题 I am doing a regression using Generalized Linear Model.I am caught offguard using the crossVal function. My implementation so far; x = 'Some dataset, containing the input and the output' X = x(:,1:7); Y = x(:,8); cvpart = cvpartition(Y,'holdout',0.3); Xtrain = X(training(cvpart),:); Ytrain = Y(training(cvpart),:); Xtest = X(test(cvpart),:); Ytest = Y(test(cvpart),:); mdl = GeneralizedLinearModel.fit(Xtrain,Ytrain,'linear','distr','poisson'); Ypred = predict(mdl,Xtest); res = (Ypred - Ytest);

B Spline confusion

做~自己de王妃 提交于 2020-01-01 05:28:08
问题 I realise that there are posts on the topic of B-Splines on this board but those have actually made me more confused so I thought someone might be able to help me. I have simulated data for x-values ranging from 0 to 1. I'd like to fit to my data a cubic spline ( degree = 3 ) with knots at 0, 0.1, 0.2, ... , 0.9, 1. I'd also like to use the B-Spline basis and OLS for parameter estimation (I'm not looking for penalised splines). I think I need the bs function from the spline package but I'm

scipy linregress function erroneous standard error return?

懵懂的女人 提交于 2020-01-01 05:02:50
问题 I have a weird situation with scipy.stats.linregress seems to be returning an incorrect standard error: from scipy import stats x = [5.05, 6.75, 3.21, 2.66] y = [1.65, 26.5, -5.93, 7.96] gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y) >>> gradient 5.3935773611970186 >>> intercept -16.281127993087829 >>> r_value 0.72443514211849758 >>> r_value**2 0.52480627513624778 >>> std_err 3.6290901222878866 Whereas Excel returns the following: slope: 5.394 intercept: -16.281 rsq: 0

Orthogonal regression fitting in scipy least squares method

ⅰ亾dé卋堺 提交于 2020-01-01 04:18:32
问题 The leastsq method in scipy lib fits a curve to some data. And this method implies that in this data Y values depends on some X argument. And calculates the minimal distance between curve and the data point in the Y axis (dy) But what if I need to calculate minimal distance in both axes (dy and dx) Is there some ways to implement this calculation? Here is a sample of code when using one axis calculation: import numpy as np from scipy.optimize import leastsq xData = [some data...] yData =

Orthogonal regression fitting in scipy least squares method

时光总嘲笑我的痴心妄想 提交于 2020-01-01 04:18:11
问题 The leastsq method in scipy lib fits a curve to some data. And this method implies that in this data Y values depends on some X argument. And calculates the minimal distance between curve and the data point in the Y axis (dy) But what if I need to calculate minimal distance in both axes (dy and dx) Is there some ways to implement this calculation? Here is a sample of code when using one axis calculation: import numpy as np from scipy.optimize import leastsq xData = [some data...] yData =