regression

mgcv: how to specify interaction between smooth and factor?

天涯浪子 提交于 2019-11-30 15:15:26
In R, I would like to fit a gam model with categorical variables. I thought I could do it like with (cat is the categorical variable). lm(data = df, formula = y ~ x1*cat + x2 + x3); But I can't do things like : gam(data = df, formula = y ~ s(x1)*cat + s(x2) + x3) but the following works: gam(data = df, formula = y ~ cat + s(x1) + s(x2) + x3) How do I add a categorical variable to just one of the splines? One of the comments has more or less told you how. Use by variable: s(x1, by = cat) This creates the "factor smooth" smoothing class fs , where a smooth function of x1 is created for each

Looping through covariates in regression using R

六月ゝ 毕业季﹏ 提交于 2019-11-30 14:42:44
I'm trying to run 96 regressions and save the results as 96 different objects. To complicate things, I want the subscript on one of the covariates in the model to also change 96 times. I've almost solved the problem but I've unfortunately hit a wall. The code so far is, for(i in 1:96){ assign(paste("z.out", i,sep=""), lm(rMonExp_EGM~ TE_i + Month2+Month3+Month4+Month5+Month6+Month7+Month8+Month9+ Month10+Month11+Month12+Yrs_minus_2004 + as.factor(LGA),data=Pokies)) } This works on the object creation side (e.g. I have z.out1 - z.out96) but I can't seem to get the subscript on the covariate to

R smooth.spline(): smoothing spline is not smooth but overfitting my data

我们两清 提交于 2019-11-30 14:08:47
I have several data points which seem suitable for fitting a spline through them. When I do this, I get a rather bumpy fit, like overfitting, which is not what I understand as smoothing. Is there a special option / parameter for getting back the function of a really smooth spline like here . The usage of the penalty parameter for smooth.spline didn't have any visible effect. Maybe I did it wrong? Here are data and code: results <- structure( list( beta = c( 0.983790622281964, 0.645152464354322, 0.924104713597375, 0.657703886566088, 0.788138034115623, 0.801080207252363, 1, 0.858337365965949, 0

Manually build logistic regression model for prediction in R

爷,独闯天下 提交于 2019-11-30 14:07:35
I'm attempting to test a logistic regression model (e.g. 3 coefficients for 3 predictor variables, X1,X2,X3), on a dataset. I'm aware of how to test a model after i created the model object using, for example, mymodel <- glm( Outcome ~ X1 + X2 + X3 , family = binomial,data=trainDat) and then test the data prob <- predict(mymodel,type="response",newdata=test) But i want to, now, create a logistic model using coefficients and intercept that I have, and then test this model on data. Basically I'm not clear on how to create "mymodel" without running glm. Context for the question: I've run a

Is there a function or package which will simulate predictions for an object returned from lm()?

大憨熊 提交于 2019-11-30 13:39:33
Is there a single function, similar to "runif", "rnorm" and the like which will produce simulated predictions for a linear model? I can code it on my own, but the code is ugly and I assume that this is something someone has done before. slope = 1.5 intercept = 0 x = as.numeric(1:10) e = rnorm(10, mean=0, sd = 1) y = slope * x + intercept + e fit = lm(y ~ x, data = df) newX = data.frame(x = as.numeric(11:15)) What I'm interested in is a function that looks like the line below: sims = rlm(1000, fit, newX) That function would return 1000 simulations of y values, based on the new x variables.

Python Multiple Linear Regression using OLS code with specific data?

我是研究僧i 提交于 2019-11-30 13:37:01
问题 I am using the ols.py code downloaded at scipy Cookbook (the download is in the first paragraph with the bold OLS) but I need to understand rather than using random data for the ols function to do a multiple linear regression. I have a specific dependent variable y , and three explanatory variables. Every time I try to put in my variables in place of the random variables, it gives me the error: TypeError: this constructor takes no arguments. Can anyone help? Is this possible to do? Here is a

Regression and summary statistics by group within a data.table

两盒软妹~` 提交于 2019-11-30 12:44:09
I would like to calculate some summary statistics and perform different regressions by group within a data table, and have the results in "wide" format (i.e. one row per group with several columns). I can do it in multiple steps, but it seems like it should be possible to do all at once. Consider this example data : set.seed=46984 dt <- data.table(ID=c(rep('Frank',5),rep('Tony',5),rep('Ed',5)), y=rnorm(15), x=rnorm(15), z=rnorm(15),key="ID") dt # ID y x z # 1: Ed 0.2129400 -0.3024061 0.845335632 # 2: Ed 0.4850342 -0.5159197 -0.087965415 # 3: Ed 1.8917489 1.7803220 0.760465271 # 4: Ed -0

sklearn LogisticRegression and changing the default threshold for classification

≡放荡痞女 提交于 2019-11-30 11:39:17
I am using LogisticRegression from the sklearn package, and have a quick question about classification. I built a ROC curve for my classifier, and it turns out that the optimal threshold for my training data is around 0.25. I'm assuming that the default threshold when creating predictions is 0.5. How can I change this default setting to find out what the accuracy is in my model when doing a 10-fold cross-validation? Basically, I want my model to predict a '1' for anyone greater than 0.25, not 0.5. I've been looking through all the documentation, and I can't seem to get anywhere. Thanks in

Simple multidimensional curve fitting

◇◆丶佛笑我妖孽 提交于 2019-11-30 11:05:10
问题 I have a bunch of data, generally in the form a, b, c, ..., y where y = f(a, b, c...) Most of them are three and four variables, and have 10k - 10M records. My general assumption is that they are algebraic in nature, something like: y = P1 a^E1 + P2 b^E2 + P3 c^E3 Unfortunately, my last statistical analysis class was 20 years ago. What is the easiest way to get a good approximation of f? Open source tools with a very minimal learning curve (i.e. something where I could get a decent

Prediction of 'mlm' linear model object from `lm()`

穿精又带淫゛_ 提交于 2019-11-30 09:37:20
问题 I have three datasets: response - matrix of 5(samples) x 10(dependent variables) predictors - matrix of 5(samples) x 2(independent variables) test_set - matrix of 10(samples) x 10(dependent variables defined in response) response <- matrix(sample.int(15, size = 5*10, replace = TRUE), nrow = 5, ncol = 10) colnames(response) <- c("1_DV","2_DV","3_DV","4_DV","5_DV","6_DV","7_DV","8_DV","9_DV","10_DV") predictors <- matrix(sample.int(15, size = 7*2, replace = TRUE), nrow = 5, ncol = 2) colnames