linear-regression

decreasing coefficients in R's coefplot?

和自甴很熟 提交于 2019-12-11 06:39:14
问题 coefplot from library(coefplot) has a variable decreasing which when set to to TRUE the coefficients should be plotted in descending order But when I run a toy example: data(tips, package = "reshape2") mod1 <- lm(tip ~ day + sex + smoker, data = tips) coefplot.glm(mod2, decreasing = TRUE) the coefficients aren't in descending order. What am I missing? EDIT I was missing sort = "magnitude" . However, this doesn't work with multiplot : data(tips, package = "reshape2") mod1 <- lm(tip ~ day + sex

Fitting regression multiple times and gather summary statistics

前提是你 提交于 2019-12-11 06:35:43
问题 I have a dataframe that looks like this: W01 0.750000 0.916667 0.642857 1.000000 0.619565 W02 0.880000 0.944444 0.500000 0.991228 0.675439 W03 0.729167 0.900000 0.444444 1.000000 0.611111 W04 0.809524 0.869565 0.500000 1.000000 0.709091 W05 0.625000 0.925926 0.653846 1.000000 0.589286 Variation 1_941119_A/G 1_942335_C/G 1_942451_T/C 1_942934_G/C \ W01 0.967391 0.965909 1 0.130435 W02 0.929825 0.937500 1 0.184211 W03 0.925926 0.880000 1 0.138889 W04 0.918182 0.907407 1 0.200000 W05 0.901786 0

How to implement Latent Dirichlet Allocation in regression analysis

╄→гoц情女王★ 提交于 2019-12-11 05:26:13
问题 I have a dataset consisting of hotel reviews, ratings, and other features such as traveler type, and word count of the review. I want to perform topic modeling (LDA) and use the topics derived from the reviews as well as other features to identify the features that most affects the ratings (ratings as the dependent variable). If I want to use linear regression to do this, does this mean I would have to label each review with the topics derived? Is there a way to do this in R or will I have to

R: How to or should I drop an insignificant orthogonal polynomial basis in a linear model?

こ雲淡風輕ζ 提交于 2019-12-11 05:07:43
问题 I have soil moisture data with x-, y- and z-coordinates like this: gue <- structure(list(x = c(311939.1507, 311935.4607, 311924.7316, 311959.553, 311973.5368, 311953.3743, 311957.9409, 311948.3151, 311946.7169, 311997.0803, 312017.5236, 312006.0245, 312001.5179, 311992.7044, 311977.3076, 311960.4159, 311970.6047, 311957.2564, 311866.4246, 311870.8714, 311861.4461, 311928.7096, 311929.6291, 311929.4233, 311891.2915, 311890.3429, 311900.8905, 311864.4995, 311870.8143, 311866.9257, 312002.571,

What R function can be used instead of box.tidewell() when a lot of predictors need to be transformed?

◇◆丶佛笑我妖孽 提交于 2019-12-11 04:57:18
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 2 years ago . I am doing a multiple regression on a data set containing one dependent variables and 13 independent variables. The box.tidewell() method only works for the first 6 predictors after which it reaches the maximum number of iterations. I tried changing the number of max.inter in the argument, but the following error is displayed: Error in lm.fit(cbind(1, x1.p, x2), y, ...) : NA

Error of slope using numpy.polyfit and dependent variable

拈花ヽ惹草 提交于 2019-12-11 04:57:14
问题 I am having trouble understanding the error estimates between the variables when performing linear regression. Referring to How to find error on slope and intercept using numpy.polyfit say I am fitting a straight line with numpy.polyfit (code below). As mentioned in the question in the link, the square root of the diagonals of the covariance matrix are the estimated standard-deviation for each of the fitted coefficients, and so np.sqrt(V[0][0])) is the standard deviation of the slope. My

SPSS creating a loop for a multiple regression over several variables

China☆狼群 提交于 2019-12-11 04:38:21
问题 For my master thesis I have to use SPSS to analyse my data. Actually I thought that I don't have to deal with very difficult statistical issues, which is still true regarding the concepts of my analysis. BUT the problem is now that in order to create my dependent variable I need to use the syntax editor/ programming in general and I have no experience in this area at all. I hope you can help me in the process of creating my syntax. I have in total approximately 900 companies with 6 year

c# linear regression given 2 sets of data

☆樱花仙子☆ 提交于 2019-12-11 04:37:09
问题 I have 2 sets of data - one is an average position and the other a score so for every position, i have the predicted score of an item - double[] positions = {0.1,0.2,0.3,0.45,0.46,...}; double[] scores = {1,1.2,1.5,2.2,3.4,...}; I need to create a function that predicts the score for average position, so given a new item with position 1.7. I under stand the function should be something like y=a*x + b but how do i get to it? Any help will be appreciated! 回答1: Yes, you have to build a linear

Run linear model in a powerset of variables

人盡茶涼 提交于 2019-12-11 03:56:00
问题 I am trying to get a data frame with different variables and run a linear model for each combination of those variables. A simple example is: names <- c("Var1", "Var2", "Var3") vars <- ggm::powerset(names, sort = T, nonempty = T) The powerset function gives me all the combinations of the 3 variables -- a list with 7 elements, each element is of type character. (the actual code I am trying to run has 16 variables, that's why I don´t want to manually write each of the models). What I would like

rstudent() returns incorrect result for an “mlm” (linear models fitted with multiple LHS)

巧了我就是萌 提交于 2019-12-11 03:19:30
问题 I know that the support for linear models with multiple LHS is limited. But when it is possible to run a function on an "mlm" object, I would expect the results to be trusty. When using rstudent , strange results are produced. Is this a bug or is there some other explanation? In the example below fittedA and fittedB are identical, but in the case of rstudent the 2nd column differs. y <- matrix(rnorm(20), 10, 2) x <- 1:10 fittedA <- fitted(lm(y ~ x)) fittedB <- cbind(fitted(lm(y[, 1] ~ x)),