linear-regression

Categorical and ordinal feature data difference in regression analysis?

走远了吗. 提交于 2021-02-19 05:18:09
问题 I am trying to completely understand difference between categorical and ordinal data when doing regression analysis. For now, what is clear: Categorical feature and data example: Color: red, white, black Why categorical: red < white < black is logically incorrect Ordinal feature and data example: Condition: old, renovated, new Why ordinal: old < renovated < new is logically correct Categorical-to-numeric and ordinal-to-numeric encoding methods: One-Hot encoding for categorical data Arbitrary

Categorical and ordinal feature data difference in regression analysis?

被刻印的时光 ゝ 提交于 2021-02-19 05:15:49
问题 I am trying to completely understand difference between categorical and ordinal data when doing regression analysis. For now, what is clear: Categorical feature and data example: Color: red, white, black Why categorical: red < white < black is logically incorrect Ordinal feature and data example: Condition: old, renovated, new Why ordinal: old < renovated < new is logically correct Categorical-to-numeric and ordinal-to-numeric encoding methods: One-Hot encoding for categorical data Arbitrary

Categorical and ordinal feature data difference in regression analysis?

谁说胖子不能爱 提交于 2021-02-19 05:15:46
问题 I am trying to completely understand difference between categorical and ordinal data when doing regression analysis. For now, what is clear: Categorical feature and data example: Color: red, white, black Why categorical: red < white < black is logically incorrect Ordinal feature and data example: Condition: old, renovated, new Why ordinal: old < renovated < new is logically correct Categorical-to-numeric and ordinal-to-numeric encoding methods: One-Hot encoding for categorical data Arbitrary

Linear regression on raster images - lm complains about NAs

喜你入骨 提交于 2021-02-18 19:13:46
问题 I'm sure this can be fixed with few bytes, but I've spent hours on this simple thing and can't get out of it. I don't use R often. I have 5 asciigrid files that represent 5 raster images. Some pixels do have values, other do have NAs. For example, the first image might be something like: NA NA NA NA NA NA NA 2 3 NA NA 0.2 0.3 1 NA NA NA 4 NA NA and the second might be: NA NA NA NA NA NA NA 5 1 NA NA 0.1 12 12 NA NA NA 6 NA NA As you can see, NA position is always the same and I'm 100% sure

Linear regression on raster images - lm complains about NAs

喜欢而已 提交于 2021-02-18 19:10:28
问题 I'm sure this can be fixed with few bytes, but I've spent hours on this simple thing and can't get out of it. I don't use R often. I have 5 asciigrid files that represent 5 raster images. Some pixels do have values, other do have NAs. For example, the first image might be something like: NA NA NA NA NA NA NA 2 3 NA NA 0.2 0.3 1 NA NA NA 4 NA NA and the second might be: NA NA NA NA NA NA NA 5 1 NA NA 0.1 12 12 NA NA NA 6 NA NA As you can see, NA position is always the same and I'm 100% sure

Print OLS regression summary to text file

╄→гoц情女王★ 提交于 2021-02-18 05:56:41
问题 I am running OLS regression using pandas.stats.api.ols using a groupby with the following code: from pandas.stats.api import ols df=pd.read_csv(r'F:\file.csv') result=df.groupby(['FID']).apply(lambda d: ols(y=d.loc[:, 'MEAN'], x=d.loc[:, ['Accum_Prcp', 'Accum_HDD']])) for i in result: x=pd.DataFrame({'FID':i.index, 'delete':i.values}) frame = pd.concat([x,DataFrame(x['delete'].tolist())], axis=1, join='outer') del frame['delete'] print frame but this returns the error: AttributeError: 'OLS'

What package in R is used to calculate non-zero null hypothesis p-values on linear models?

a 夏天 提交于 2021-02-17 06:17:06
问题 The standard summary(lm(Height~Weight)) will output results for the hypothesis test H0: Beta1=0, but if I am interested in testing the hypothesis H0: B1=1 is there a package that will produce that p-value? I know I can calculate it by hand and I know I can "flip the confidence interval" for a two tailed test (test a 95% hypothesis by seeing if the 95% confint contains the point of interest), but I am looking for an easy way to generate the p-values for a simulation study. 回答1: You can use

What package in R is used to calculate non-zero null hypothesis p-values on linear models?

一笑奈何 提交于 2021-02-17 06:17:05
问题 The standard summary(lm(Height~Weight)) will output results for the hypothesis test H0: Beta1=0, but if I am interested in testing the hypothesis H0: B1=1 is there a package that will produce that p-value? I know I can calculate it by hand and I know I can "flip the confidence interval" for a two tailed test (test a 95% hypothesis by seeing if the 95% confint contains the point of interest), but I am looking for an easy way to generate the p-values for a simulation study. 回答1: You can use

How to instantiate a Scikit-Learn linear model with known coefficients without fitting it

一笑奈何 提交于 2021-02-11 14:48:04
问题 Background I am testing various saved models as part of an experiment, but one of the models comes from an algorithm I wrote, not from a sklearn model-fitting. However, my custom model is still a linear model so I want to instantiate a LinearModel instance and set the coef_ and intercept_ attributes to the values from my custom fitting algorithm so I can use it for predictions. What I tried so far: from sklearn.linear_model import LinearRegression my_intercepts = np.ones(2) my_coefficients =

How to manually calculate Cook's distance

这一生的挚爱 提交于 2021-02-11 14:16:41
问题 I calculated Cook's distance manually and with the function cooks.distance and I got two different results. Can someone please help me understand why? Below is how I manually calculate Cook's distance: j=rnorm(100) o=rexp(100) p=runif(100) model=lm(j~o+p) O=model.matrix(model) P = O%*% solve(t(O) %*% O) %*% t(O) lev=diag(P) b<-solve(t(O)%*%O)%*%t(O)%*%j RSS <- sum((j-O%*%b)^2) s2<- RSS/97 #three predictors (including intercept (100-3=97)) residuals(model)^2/(4*s2)*(lev/(1-lev)^2) The above