regression | 易学教程

Dummy Coding of Nominal Attributes - Effect of Using K Dummies, Effect of Attribute Selection

阅读更多关于 Dummy Coding of Nominal Attributes - Effect of Using K Dummies, Effect of Attribute Selection

问题 Summing up my understanding of the topic 'Dummy Coding' is usually understood as coding a nominal attribute with K possible values as K-1 binary dummies. The usage of K values would cause redundancy and would have a negative impact e.g. on logistic regression, as far as I learned it. That far, everything's clear to me. Yet, two issues are unclear to me: 1) Bearing in mind the issue stated above, I am confused that the 'Logistic' classifier in WEKA actually uses K dummies (see picture). Why

Error in scale.default: length of 'center' must equal the number of columns of 'x'

阅读更多关于 Error in scale.default: length of 'center' must equal the number of columns of 'x'

问题 I am using mboost package to do some classification. Here is the code library('mboost') load('so-data.rdata') model <- glmboost(is_exciting~., data=training, family=Binomial()) pred <- predict(model, newdata=test, type="response") But R complains when doing prediction that Error in scale.default(X, center = cm, scale = FALSE) : length of 'center' must equal the number of columns of 'x' The data ( training and test ) can be downloaded here (7z, zip). What is the reason of the error and how to

repeated measure anova using regression models (LM, LMER)

阅读更多关于 repeated measure anova using regression models (LM, LMER)

问题 I would like to run repeated measure anova in R using regression models instead an 'Analysis of Variance' ( AOV ) function. Here is an example of my AOV code for 3 within-subject factors: m.aov<-aov(measure~(task*region*actiontype) + Error(subject/(task*region*actiontype)),data) Can someone give me the exact syntax to run the same analysis using regression models? I want to make sure to respect the independence of residuals, i.e. use specific error terms as with AOV. In a previous post I read

Scikit-learn Ridge Regression with unregularized intercept term

阅读更多关于 Scikit-learn Ridge Regression with unregularized intercept term

问题 Does the scikit-learn Ridge regression include the intercept coefficient in the regularization term, and if so, is there a way to run ridge regression without regularizing the intercept? Suppose I fit a Ridge Regression: from sklearn import linear_model mymodel = linear_model.Ridge(alpha=0.1, fit_intercept=True).fit(X, y) print mymodel.coef_ print mymodel.intercept_ for some data X, y where X does not include a column of 1's. fit_intercept=True will automatically add an intercept column, and

mgcv gam() error: model has more coefficients than data

阅读更多关于 mgcv gam() error: model has more coefficients than data

问题 I am using GAM (generalized additive models) for my dataset. This dataset has 32 observations, with 6 predictor variables and a response variable (namely power). I am using gam() function of the mgcv package to fit the models. Whenever, I try to fit a model I do get an error message as: Error in gam(formula.hh, data = data, na.action = na.exclude, : Model has more coefficients than data From this error message, I infer that I have more predictor variables as compared to the number of

Calculate cross validation for Generalized Linear Model in Matlab

阅读更多关于 Calculate cross validation for Generalized Linear Model in Matlab

问题 I am doing a regression using Generalized Linear Model.I am caught offguard using the crossVal function. My implementation so far; x = 'Some dataset, containing the input and the output' X = x(:,1:7); Y = x(:,8); cvpart = cvpartition(Y,'holdout',0.3); Xtrain = X(training(cvpart),:); Ytrain = Y(training(cvpart),:); Xtest = X(test(cvpart),:); Ytest = Y(test(cvpart),:); mdl = GeneralizedLinearModel.fit(Xtrain,Ytrain,'linear','distr','poisson'); Ypred = predict(mdl,Xtest); res = (Ypred - Ytest);

B Spline confusion

阅读更多关于 B Spline confusion

问题 I realise that there are posts on the topic of B-Splines on this board but those have actually made me more confused so I thought someone might be able to help me. I have simulated data for x-values ranging from 0 to 1. I'd like to fit to my data a cubic spline ( degree = 3 ) with knots at 0, 0.1, 0.2, ... , 0.9, 1. I'd also like to use the B-Spline basis and OLS for parameter estimation (I'm not looking for penalised splines). I think I need the bs function from the spline package but I'm

scipy linregress function erroneous standard error return?

阅读更多关于 scipy linregress function erroneous standard error return?

问题 I have a weird situation with scipy.stats.linregress seems to be returning an incorrect standard error: from scipy import stats x = [5.05, 6.75, 3.21, 2.66] y = [1.65, 26.5, -5.93, 7.96] gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y) >>> gradient 5.3935773611970186 >>> intercept -16.281127993087829 >>> r_value 0.72443514211849758 >>> r_value**2 0.52480627513624778 >>> std_err 3.6290901222878866 Whereas Excel returns the following: slope: 5.394 intercept: -16.281 rsq: 0

Orthogonal regression fitting in scipy least squares method

阅读更多关于 Orthogonal regression fitting in scipy least squares method

问题 The leastsq method in scipy lib fits a curve to some data. And this method implies that in this data Y values depends on some X argument. And calculates the minimal distance between curve and the data point in the Y axis (dy) But what if I need to calculate minimal distance in both axes (dy and dx) Is there some ways to implement this calculation? Here is a sample of code when using one axis calculation: import numpy as np from scipy.optimize import leastsq xData = [some data...] yData =

Orthogonal regression fitting in scipy least squares method

阅读更多关于 Orthogonal regression fitting in scipy least squares method