regression

How to estimate goodness-of-fit using scipy.odr?

烈酒焚心 提交于 2019-12-04 07:58:59
I am fitting data with weights using scipy.odr but I don't know how to obtain a measure of goodness-of-fit or an R squared. Does anyone have suggestions for how to obtain this measure using the output stored by the function? The res_var attribute of the Output is the so-called reduced Chi-square value for the fit, a popular choice of goodness-of-fit statistic. It is somewhat problematic for non-linear fitting, though. You can look at the residuals directly ( out.delta for the X residuals and out.eps for the Y residuals). Implementing a cross-validation or bootstrap method for determining

Fast group-by simple linear regression

蹲街弑〆低调 提交于 2019-12-04 07:54:21
This Q & A arises from How to make group_by and lm fast? where OP was trying to do a simple linear regression per group for a large data frame. In theory, a series of group-by regression y ~ x | g is equivalent to a single pooled regression y ~ x * g . The latter is very appealing because statistical test between different groups is straightforward. But in practice doing this larger regression is not computationally easy. My answer on the linked Q & A reviews packages speedlm and glm4 , but pointed out that they can't well address this problem. Large regression problem is difficult,

Piecewise regression with a straight line and a horizontal line joining at a break point

99封情书 提交于 2019-12-04 07:14:52
I want to do a piecewise linear regression with one break point, where the 2nd half of the regression line has slope = 0 . There are examples of how to do a piecewise linear regression, such as here . The problem I'm having is I'm not clear how to fix the slope of half of the model to be 0. I tried lhs <- function(x) ifelse(x < k, k-x, 0) rhs <- function(x) ifelse(x < k, 0, x-k) fit <- lm(y ~ lhs(x) + rhs(x)) where k is the break point, but the segment on the right is not a flat / horizontal one. I want to constrain the slope of the second segment at 0. I tried: fit <- lm(y ~ x * (x < k) + x *

Working with, preparing bag-of-word data for Regression

瘦欲@ 提交于 2019-12-04 06:23:52
问题 Im trying to create a regression model that predicts an authors age. Im using (Nguyen et al,2011) as my basis. Using a Bag of Words Model I count the occurences of words per Document (which are Posts from Boards) and create the vector for every Post. I limit the size of each vector by using as features the top-k (k=number) most frequent used words(stopwords will not be used) Vectorexample_with_k_8 = [0,0,0,1,0,3,0,0] My data is generally sparse like in the Example. When I test the model on my

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

爷,独闯天下 提交于 2019-12-04 06:23:19
I have the following code for minimizing the sum of deviation using optim() to find beta0 and beta1 but I am receiving the following errors I am not sure what I am doing wrong: sum.abs.dev<-function(beta=c(beta0,beta1),a,b) { total<-0 n<-length(b) for (i in 1:n) { total <- total + (b[i]-beta[1]-beta[2]*a[i]) } return(total) } tlad <- function(y = "farm", x = "land", data="FarmLandArea.csv") { dat <- read.csv(data) #fit<-lm(dat$farm~dat$land) fit<-lm(y~x,data=dat) beta.out=optim(fit$coefficients,sum.abs.dev) return(beta.out) } Here's the error and warnings are receive: Error in `contrasts<-`(`

Testing a regression network in caffe

对着背影说爱祢 提交于 2019-12-04 05:37:31
问题 I am trying to count objects in an image using Alexnet. I have currently images containing 1, 2, 3 or 4 objects per image. For initial checkup, I have 10 images per class. For example in training set I have: image label image1 1 image2 1 image3 1 ... image39 4 image40 4 I used imagenet create script to create a lmdb file for this dataset. Which successfully converted my set of images to lmdb. Alexnet, as an example is converted to a regression model for learning the number of objects in the

Write Regression summary to the csv file in R

爱⌒轻易说出口 提交于 2019-12-04 05:24:10
I have data on revenue of a company from sales of various products (csv files), one of which looks like the following: > abc Order.Week..BV. Product.Number Quantity Net.ASP Net.Price 1 2013-W44 ABCDEF 92 823.66 749 2 2013-W44 ABCDEF 24 898.89 749 3 2013-W44 ABCDEF 243 892.00 749 4 2013-W45 ABCDEF 88 796.84 699 5 2013-W45 ABCDEF 18 744.80 699 Now, I'm fitting a multiple regression model with Net.Price as Y and Quantity, Net.ASP as x1 and x2. There are more than 100 such files and I'm trying to do it using the following code: fileNames <- Sys.glob("*.csv") for (fileName in fileNames) { abc <-

repeated measure anova using regression models (LM, LMER)

懵懂的女人 提交于 2019-12-04 05:18:59
I would like to run repeated measure anova in R using regression models instead an 'Analysis of Variance' ( AOV ) function. Here is an example of my AOV code for 3 within-subject factors: m.aov<-aov(measure~(task*region*actiontype) + Error(subject/(task*region*actiontype)),data) Can someone give me the exact syntax to run the same analysis using regression models? I want to make sure to respect the independence of residuals, i.e. use specific error terms as with AOV. In a previous post I read an answer of the type: lmer(DV ~ 1 + IV1*IV2*IV3 + (IV1*IV2*IV3|Subject), dataset)) I am really not

How to get the prediction of test from 2D parameters of WLS regression in statsmodels

拜拜、爱过 提交于 2019-12-04 05:16:06
问题 I'm incrementally up the parameters of WLS regression functions using statsmodels. I have a 10x3 dataset X that I declared like this: X = np.array([[1,2,3],[1,2,3],[4,5,6],[1,2,3],[4,5,6],[1,2,3],[1,2,3],[4,5,6],[4,5,6],[1,2,3]]) This is my dataset, and I have a 10x2 endog vector that looks like this: z = [[ 3.90311860e-322 2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 -2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 -2.00000000e+000] [ 0

Solving normal equation gives different coefficients from using `lm`?

僤鯓⒐⒋嵵緔 提交于 2019-12-04 05:01:44
问题 I wanted to compute a simple regression using the lm and plain matrix algebra. However, my regression coefficients obtained from matrix algebra are only half of those obtained from using the lm and I have no clue why. Here's the code boot_example <- data.frame( x1 = c(1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), x2 = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L), x3 = c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L), x4 = c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L), x5 = c(1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L), x6 = c(0L, 1L,