linear-regression | 易学教程

Place results of predict() in a for loop inside a list

阅读更多关于 Place results of predict() in a for loop inside a list

问题 Let us say I want to run the linear regression model on the mtcars dataset several times on different samples. The idea is, for each iteration in a for loop, to store the results of the predict() method every time the linear regression is run for a different sample. The small example follows for one run: ## Perform model once on a Sample and use model on full dataset: Sample_Size <- 10 Sample <- mtcars[sample(nrow(mtcars), Sample_Size), ] Model <- lm(formula = mpg ~ wt, data = Sample)

Looping through many multiple regressions

阅读更多关于 Looping through many multiple regressions

问题 I am trying to run this code from this post: looping with iterations over two lists of variables for a multiple regression in R with modified variable and data frame names, because it seems to do exactly what I want and uses a very similar dataset. However, it keeps giving me an error and I don't know why, so I would really appreciate if someone could help me to understand the error or the corresponding line of code so I could try to figure out what's wrong. for(i in 1:n) { vars = names

How to make group_by and lm fast?

阅读更多关于 How to make group_by and lm fast?

问题 This is a sample. df <- tibble( subject = rep(letters[1:7], c(5, 6, 7, 5, 2, 5, 2)), day = c(3:7, 2:7, 1:7, 3:7, 6:7, 3:7, 6:7), x1 = runif(32), x2 = rpois(32, 3), x3 = rnorm(32), x4 = rnorm(32, 1, 5)) df %>% group_by(subject) %>% summarise( coef_x1 = lm(x1 ~ day)$coefficients[2], coef_x2 = lm(x2 ~ day)$coefficients[2], coef_x3 = lm(x3 ~ day)$coefficients[2], coef_x4 = lm(x4 ~ day)$coefficients[2]) This data is small, so performance is not problem. But my data is so large, approximately 1,000

How to drop insignificant categorical interaction terms Python StatsModel

阅读更多关于 How to drop insignificant categorical interaction terms Python StatsModel

问题 In stats model it's easy to add interaction term. However not all of the interactions are significant. My question is how to drop those that are insignificant? For example airport at Kootenay. # -*- coding: utf-8 -*- import pandas as pd import statsmodels.formula.api as sm if __name__ == "__main__": # Read data census_subdivision_without_lower_mainland_and_van_island = pd.read_csv('../data/augmented/census_subdivision_without_lower_mainland_and_van_island.csv') # Fit all data fit = sm.ols

Broken stick (or piecewise) regression with 2 breakpoints

阅读更多关于 Broken stick (or piecewise) regression with 2 breakpoints

问题 I want to estimate two breakpoints of a function with the next data: df = data.frame (x = 1:180, y = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 2, 2, 4, 2, 2, 3, 2, 1, 2,0, 1, 0, 1, 4, 0, 1, 2, 3, 1, 1, 1, 0, 2, 0, 3, 2, 1, 1, 1, 1, 5, 4, 2, 1, 0, 2, 1, 1, 2, 0, 0, 2, 2, 1, 1, 1, 0, 0, 0, 0, 2, 3, 0, 3, 2, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

looping with iterations over two lists of variables for a multiple regression in R

阅读更多关于 looping with iterations over two lists of variables for a multiple regression in R

问题 I want to write a loop in R to run multiple regressions with one dependent variables and two lists of independent variables (all continuous variables). The model is additive and the loop should run by iterating through the two lists of variables so that it takes the first column from the first list + the first column from the second list, then the same for the second column in the two lists etc. The problem is I can't get it to iterate through the lists properly, instead my loop runs more

Simple linear regression using pandas dataframe

阅读更多关于 Simple linear regression using pandas dataframe

问题 I'm looking to check trends for a number of entities (SysNr) I have data spanning 3 years (2014,2015,2016) I'm looking at a large quantity of variables, but will limit this question to one ('res_f_r') My DataFrame looks something like this d = [ {'RegnskabsAar': 2014, 'SysNr': 1, 'res_f_r': 350000}, {'RegnskabsAar': 2015, 'SysNr': 1, 'res_f_r': 400000}, {'RegnskabsAar': 2016, 'SysNr': 1, 'res_f_r': 450000}, {'RegnskabsAar': 2014, 'SysNr': 2, 'res_f_r': 350000}, {'RegnskabsAar': 2015, 'SysNr':

iterating over formulas in purrr

阅读更多关于 iterating over formulas in purrr

问题 I have a bunch of formulas, as strings, that I'd like to use, one at a time in a glm, preferably using tidyverse functions. Here's where I am at now. library(tidyverse) library(broom) mtcars %>% dplyr::select(mpg:qsec) %>% colnames -> targcols paste('vs ~ ', targcols) -> formulas formulas #> 'vs ~ mpg' 'vs ~ cyl' 'vs ~ disp' 'vs ~ hp' 'vs ~ drat' 'vs ~ wt' 'vs ~ qsec' I can run a general linear model with any one of these formulas as glm(as.formula(formulas[1]), family = 'binomial', data =

how to merge two linear regression prediction models (each per data frame's subset) into one column of the data frame

阅读更多关于 how to merge two linear regression prediction models (each per data frame's subset) into one column of the data frame

问题 I would like to build 2 linear regression models that are based on 2 subsets of the dataset and then to have one column that contains the prediction values per each subset. Here is my data frame example : dat <- read.table(text = " cats birds wolfs snakes 0 3 8 7 1 3 8 7 1 1 2 3 0 1 2 3 0 1 2 3 1 6 1 1 0 6 1 1 1 6 1 1 ",header = TRUE) First I have built two models: # one is for wolfs ~ snakes where cats=0 f0<-lm(wolfs~snakes,data=dat,subset=dat$cats==0) #the second model is for wolfs ~ snakes

Method to find “cleanest” subset of data i.e. subset with lowest variability

阅读更多关于 Method to find “cleanest” subset of data i.e. subset with lowest variability

问题 I am trying to find a trend in several datasets. The trends involve finding the best fit line, but if i imagine the procedure would not be too different for any other model (just possibly more time consuming). There are 3 conceivable scenarios: All good data where all the data fits a single trend with a low variability All bad data where all or most of the data exhibits tremendous variability and the entire dataset must be discarded. Partial good data where some of the data may be good while