regression | 易学教程

How to estimate goodness-of-fit using scipy.odr?

阅读更多关于 How to estimate goodness-of-fit using scipy.odr?

I am fitting data with weights using scipy.odr but I don't know how to obtain a measure of goodness-of-fit or an R squared. Does anyone have suggestions for how to obtain this measure using the output stored by the function? The res_var attribute of the Output is the so-called reduced Chi-square value for the fit, a popular choice of goodness-of-fit statistic. It is somewhat problematic for non-linear fitting, though. You can look at the residuals directly ( out.delta for the X residuals and out.eps for the Y residuals). Implementing a cross-validation or bootstrap method for determining

Fast group-by simple linear regression

阅读更多关于 Fast group-by simple linear regression

This Q & A arises from How to make group_by and lm fast? where OP was trying to do a simple linear regression per group for a large data frame. In theory, a series of group-by regression y ~ x | g is equivalent to a single pooled regression y ~ x * g . The latter is very appealing because statistical test between different groups is straightforward. But in practice doing this larger regression is not computationally easy. My answer on the linked Q & A reviews packages speedlm and glm4 , but pointed out that they can't well address this problem. Large regression problem is difficult,

Piecewise regression with a straight line and a horizontal line joining at a break point

阅读更多关于 Piecewise regression with a straight line and a horizontal line joining at a break point

I want to do a piecewise linear regression with one break point, where the 2nd half of the regression line has slope = 0 . There are examples of how to do a piecewise linear regression, such as here . The problem I'm having is I'm not clear how to fix the slope of half of the model to be 0. I tried lhs <- function(x) ifelse(x < k, k-x, 0) rhs <- function(x) ifelse(x < k, 0, x-k) fit <- lm(y ~ lhs(x) + rhs(x)) where k is the break point, but the segment on the right is not a flat / horizontal one. I want to constrain the slope of the second segment at 0. I tried: fit <- lm(y ~ x * (x < k) + x *

Working with, preparing bag-of-word data for Regression

阅读更多关于 Working with, preparing bag-of-word data for Regression

问题 Im trying to create a regression model that predicts an authors age. Im using (Nguyen et al,2011) as my basis. Using a Bag of Words Model I count the occurences of words per Document (which are Posts from Boards) and create the vector for every Post. I limit the size of each vector by using as features the top-k (k=number) most frequent used words(stopwords will not be used) Vectorexample_with_k_8 = [0,0,0,1,0,3,0,0] My data is generally sparse like in the Example. When I test the model on my

Error in `contrasts<-`(`tmp`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

阅读更多关于 Error in `contrasts

I have the following code for minimizing the sum of deviation using optim() to find beta0 and beta1 but I am receiving the following errors I am not sure what I am doing wrong: sum.abs.dev<-function(beta=c(beta0,beta1),a,b) { total<-0 n<-length(b) for (i in 1:n) { total <- total + (b[i]-beta[1]-beta[2]*a[i]) } return(total) } tlad <- function(y = "farm", x = "land", data="FarmLandArea.csv") { dat <- read.csv(data) #fit<-lm(dat$farm~dat$land) fit<-lm(y~x,data=dat) beta.out=optim(fit$coefficients,sum.abs.dev) return(beta.out) } Here's the error and warnings are receive: Error in `contrasts<-`(`

Testing a regression network in caffe

阅读更多关于 Testing a regression network in caffe

问题 I am trying to count objects in an image using Alexnet. I have currently images containing 1, 2, 3 or 4 objects per image. For initial checkup, I have 10 images per class. For example in training set I have: image label image1 1 image2 1 image3 1 ... image39 4 image40 4 I used imagenet create script to create a lmdb file for this dataset. Which successfully converted my set of images to lmdb. Alexnet, as an example is converted to a regression model for learning the number of objects in the

Write Regression summary to the csv file in R

阅读更多关于 Write Regression summary to the csv file in R

I have data on revenue of a company from sales of various products (csv files), one of which looks like the following: > abc Order.Week..BV. Product.Number Quantity Net.ASP Net.Price 1 2013-W44 ABCDEF 92 823.66 749 2 2013-W44 ABCDEF 24 898.89 749 3 2013-W44 ABCDEF 243 892.00 749 4 2013-W45 ABCDEF 88 796.84 699 5 2013-W45 ABCDEF 18 744.80 699 Now, I'm fitting a multiple regression model with Net.Price as Y and Quantity, Net.ASP as x1 and x2. There are more than 100 such files and I'm trying to do it using the following code: fileNames <- Sys.glob("*.csv") for (fileName in fileNames) { abc <-

repeated measure anova using regression models (LM, LMER)

阅读更多关于 repeated measure anova using regression models (LM, LMER)

I would like to run repeated measure anova in R using regression models instead an 'Analysis of Variance' ( AOV ) function. Here is an example of my AOV code for 3 within-subject factors: m.aov<-aov(measure~(task*region*actiontype) + Error(subject/(task*region*actiontype)),data) Can someone give me the exact syntax to run the same analysis using regression models? I want to make sure to respect the independence of residuals, i.e. use specific error terms as with AOV. In a previous post I read an answer of the type: lmer(DV ~ 1 + IV1*IV2*IV3 + (IV1*IV2*IV3|Subject), dataset)) I am really not

How to get the prediction of test from 2D parameters of WLS regression in statsmodels

阅读更多关于 How to get the prediction of test from 2D parameters of WLS regression in statsmodels

问题 I'm incrementally up the parameters of WLS regression functions using statsmodels. I have a 10x3 dataset X that I declared like this: X = np.array([[1,2,3],[1,2,3],[4,5,6],[1,2,3],[4,5,6],[1,2,3],[1,2,3],[4,5,6],[4,5,6],[1,2,3]]) This is my dataset, and I have a 10x2 endog vector that looks like this: z = [[ 3.90311860e-322 2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 -2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 -2.00000000e+000] [ 0

Solving normal equation gives different coefficients from using `lm`?

阅读更多关于 Solving normal equation gives different coefficients from using `lm`?

问题 I wanted to compute a simple regression using the lm and plain matrix algebra. However, my regression coefficients obtained from matrix algebra are only half of those obtained from using the lm and I have no clue why. Here's the code boot_example <- data.frame( x1 = c(1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), x2 = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L), x3 = c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L), x4 = c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L), x5 = c(1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L), x6 = c(0L, 1L,