linear-regression

How to examine the feature weights of a Tensorflow LinearClassifier?

旧巷老猫 提交于 2019-12-22 13:52:59
问题 I am trying to understand the Large-scale Linear Models with TensorFlow documentation. The docs motivate these models as follows: Linear model can be interpreted and debugged more easily than neural nets. You can examine the weights assigned to each feature to figure out what's having the biggest impact on a prediction. So I ran the extended code example from the accompanying TensorFlow Linear Model Tutorial. In particular, I ran the example code from GitHub with the model-type flag set to

How to set up balanced one-way ANOVA for lm()

会有一股神秘感。 提交于 2019-12-22 12:38:53
问题 I have data: dat <- data.frame(NS = c(8.56, 8.47, 6.39, 9.26, 7.98, 6.84, 9.2, 7.5), EXSM = c(7.39, 8.64, 8.54, 5.37, 9.21, 7.8, 8.2, 8), Less.5 = c(5.97, 6.77, 7.26, 5.74, 8.74, 6.3, 6.8, 7.1), More.5 = c(7.03, 5.24, 6.14, 6.74, 6.62, 7.37, 4.94, 6.34)) # NS EXSM Less.5 More.5 # 1 8.56 7.39 5.97 7.03 # 2 8.47 8.64 6.77 5.24 # 3 6.39 8.54 7.26 6.14 # 4 9.26 5.37 5.74 6.74 # 5 7.98 9.21 8.74 6.62 # 6 6.84 7.80 6.30 7.37 # 7 9.20 8.20 6.80 4.94 # 8 7.50 8.00 7.10 6.34 Each column gives data

Differences in Linear Regression in R and Python [closed]

99封情书 提交于 2019-12-22 12:17:10
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . I was trying to match the linear regression R results with that of python Matching the coefficients for each of independent variable and below is the code: Data is uploaded. https://www.dropbox.com/s/oowe4irm9332s78/X.csv?dl=0 https://www.dropbox.com/s/79scp54unzlbwyk/Y.csv?dl=0 R code: #define pathname = " " X

Spark - create RDD of (label, features) pairs from CSV file

喜你入骨 提交于 2019-12-22 11:38:33
问题 I have a CSV file and want to perform a simple LinearRegressionWithSGD on the data. A sample data is as follow (the total rows in the file is 99 including labels) and the objective is to predict the y_3 variable: y_3,x_6,x_7,x_73_1,x_73_2,x_73_3,x_8 2995.3846153846152,17.0,1800.0,0.0,1.0,0.0,12.0 2236.304347826087,17.0,1432.0,1.0,0.0,0.0,12.0 2001.9512195121952,35.0,1432.0,0.0,1.0,0.0,5.0 992.4324324324324,17.0,1430.0,1.0,0.0,0.0,12.0 4386.666666666667,26.0,1430.0,0.0,0.0,1.0,25.0 1335

Plotting Regression results from lme4 in R using Lattice (or something else)

主宰稳场 提交于 2019-12-22 10:59:51
问题 I have fit a regression using lme4 thanks to a previous answer. Now that I have a regression fit for each state I'd like to use lattice to plot QQ plots for each state. I would also like to plot error plots for each state in a lattice format. How do I make a lattice plot using the results of a lme4 regression? Below is a simple sample (yeah, I like a good alliteration) using two states. I would like to make a two panel lattice made from the object fits. library(lme4) d <- data.frame(state=rep

Format of R's lm() Formula with a Transformation

自古美人都是妖i 提交于 2019-12-22 05:39:32
问题 I can't quite figure out how to do the following in one line: data(attenu) x_temp = attenu$accel^(1/4) y_temp = log(attenu$dist) best_line = lm(y_temp ~ x_temp) Since the above works, I thought I could do the following: data(attenu) best_line = lm( log(attenu$dist) ~ (attenu$accel^(1/4)) ) But this gives the error: Error in terms.formula(formula, data = data) : invalid power in formula There's obviously something I'm missing when using transformed variables in R's formula format. Why doesn't

Why `sklearn` and `statsmodels` implementation of OLS regression give different R^2?

岁酱吖の 提交于 2019-12-22 03:42:32
问题 Accidentally I have noticed, that OLS models implemented by sklearn and statsmodels yield different values of R^2 when not fitting intercept. Otherwise they seems to work fine. The following code yields: import numpy as np import sklearn import statsmodels import sklearn.linear_model as sl import statsmodels.api as sm np.random.seed(42) N=1000 X = np.random.normal(loc=1, size=(N, 1)) Y = 2 * X.flatten() + 4 + np.random.normal(size=N) sklernIntercept=sl.LinearRegression(fit_intercept=True).fit

AnalysisException: u"cannot resolve 'name' given input columns: [ list] in sqlContext in spark

孤街醉人 提交于 2019-12-22 03:22:50
问题 I tried a simple example like: data = sqlContext.read.format("csv").option("header", "true").option("inferSchema", "true").load("/databricks-datasets/samples/population-vs-price/data_geo.csv") data.cache() # Cache data for faster reuse data = data.dropna() # drop rows with missing values data = data.select("2014 Population estimate", "2015 median sales price").map(lambda r: LabeledPoint(r[1], [r[0]])).toDF() It works well, But when i try something very similar like: data = sqlContext.read

Subsetting in dredge (MuMIn) - must include interaction if main effects are present

爱⌒轻易说出口 提交于 2019-12-21 20:36:55
问题 I'm doing some exploratory work where I use dredge{MuMIn}. In this procedure there are two variables that I want to set to be allowed together ONLY when the interaction between them is present, i.e. they can not be present together only as main effects. Using sample data: I want to dredge the model fm1 (disregarding that it probably doesn't make sense). If the variables GNP and Population appear together, they must also include the interaction between them. require(stats); require(graphics) #

Getting p-value for linear regression in C gsl_fit_linear() function from GSL library

爷,独闯天下 提交于 2019-12-21 17:45:59
问题 I'm trying to reporduce some code from R in C, so I'm trying to fit a linear regression using the gsl_fit_linear() function. In R I'd use the lm() function, which returns a p-value for the fit using this code: lmAvgs<- lm( c(1.23, 11.432, 14.653, 21.6534) ~ c(1970, 1980, 1990, 2000) ) summary(lmAvgs) I've no idea though how to go from the C output to a p-value, my code looks something like this so far: int main(void) { int i, n = 4; double x[4] = { 1970, 1980, 1990, 2000 }; double y[4] = {1