linear-regression

Force R to include 0 as a value in a regression of counts vs year

£可爱£侵袭症+ 提交于 2019-12-10 17:24:54
问题 Not sure whether this question would be better off at Cross Validated, but I think it is as much of a programming question as a pure statistical one. I have a 102 x 1147 data frame where there are years (between 1960 and 2016) and each record is a scientific paper. I count the number of papers published each year within certain topics (guided by values in specific columns), and I want to calculate the linear slope from the year and the annual count of the number of papers. Here's my script,

R: predict.lm() not recognizing an object

谁说胖子不能爱 提交于 2019-12-10 17:16:17
问题 > reg.len <- lm(chao1.ave ~ lg.std.len, b.div) # b.div is my data frame imported from a CSV file > reg.len Call: lm(formula = chao1.ave ~ lg.std.len, data = b.div) Coefficients: (Intercept) lg.std.len 282.4 -115.7 > newx <- seq(0.6, 1.4, 0.01) > prd.len <- predict(reg.len, newdata=data.frame(x=newx), interval="confidence", level=0.90, type="response") Error in eval(expr, envir, enclos) : object 'lg.std.len' not found I've tried doing the lm like this: lm(b.div$chao1.ave ~ b.div$lg.std.len) ,

Linear regression in Apache Spark giving wrong intercept and weights

好久不见. 提交于 2019-12-10 17:13:31
问题 Using MLLib LinearRegressionWithSGD for the dummy data set (y, x1, x2) for y = (2*x1) + (3*x2) + 4 is producing wrong intercept and weights. Actual data used is, x1 x2 y 1 0.1 6.3 2 0.2 8.6 3 0.3 10.9 4 0.6 13.8 5 0.8 16.4 6 1.2 19.6 7 1.6 22.8 8 1.9 25.7 9 2.1 28.3 10 2.4 31.2 11 2.7 34.1 I set the following input parameters and got the below model outputs [numIterations, step, miniBatchFraction, regParam] [intercept, [weights]] [5,9,0.6,5] = [2.36667135839938E13, weights:[1

Does R always return NA as a coefficient as a result of linear regression with unnecessary variables?

微笑、不失礼 提交于 2019-12-10 13:49:39
问题 My question is about the unnecessary predictors, namely the variables that do not provide any new linear information or the variables that are linear combinations of the other predictors. As you can see the swiss dataset has six variables. library(swiss) names(swiss) # "Fertility" "Agriculture" "Examination" "Education" # "Catholic" "Infant.Mortality" Now I introduce a new variable ec . It is the linear combination of Examination and Education . ec <- swiss$Examination + swiss$Catholic When

Using polyfit on pandas dataframe and then adding the results to new columns

本秂侑毒 提交于 2019-12-10 11:54:29
问题 I have a dataframe like this. For each Id, I have (x1,x2), (y1,y2). I want to supply these to polyfit(), get the slope and the x-intercept and add them as new columns. Id x y 1 0.79978 0.018255 1 1.19983 0.020963 2 2.39998 0.029006 2 2.79995 0.033004 3 1.79965 0.021489 3 2.19969 0.024194 4 1.19981 0.019338 4 1.59981 0.022200 5 1.79971 0.025629 5 2.19974 0.028681 I really need help with grouping the correct rows and supplying them to polyfit. I have been struggling with this. Any help would be

Why does my linear regression fit line look wrong?

ぐ巨炮叔叔 提交于 2019-12-10 03:34:21
问题 I have plotted a 2-D histogram in a way that I can add to the plot with lines, points etc. Now I seek to apply a linear regression fit at the region of dense points, however my linear regression line seems totally off where it should be? To demonstrate here is my plot on the left with both a lowess regression fit and linear fit. lines(lowess(na.omit(a),na.omit(b),iter=10),col='gray',lwd=3) abline(lm(b[cc]~a[cc]),lwd=3) Here a and b are my values and cc are the points within the densest parts

How can I obtain segmented linear regressions with a priori breakpoints?

泄露秘密 提交于 2019-12-09 22:33:50
问题 I need to explain this in excruciating detail because I don't have the basics of statistics to explain in a more succinct way. Asking here in SO because I am looking for a python solution, but might go to stats.SE if more appropriate. I have downhole well data, it might be a bit like this: Rt T 0.0000 15.0000 4.0054 15.4523 25.1858 16.0761 27.9998 16.2013 35.7259 16.5914 39.0769 16.8777 45.1805 17.3545 45.6717 17.3877 48.3419 17.5307 51.5661 17.7079 64.1578 18.4177 66.8280 18.5750 111.1613 19

fit to time series using Gnuplot

安稳与你 提交于 2019-12-09 19:22:32
问题 I am a big fan of Gnuplot and now I would like to use the fit-function for time series. My data set is like: 1.000000 1.000000 0.999795 0.000000 0.000000 0.421927 0.654222 -25.127700 1.000000 1994-08-12 1.000000 2.000000 0.046723 -0.227587 -0.689491 0.328387 1.000000 0.000000 1.000000 1994-08-12 2.000000 1.000000 0.945762 0.000000 0.000000 0.400038 0.582360 -8.624480 1.000000 1995-04-19 2.000000 2.000000 0.060228 -0.056367 -0.680224 0.551019 1.000000 0.000000 1.000000 1995-04-19 3.000000 1

Linear regression with tensorflow

試著忘記壹切 提交于 2019-12-09 17:59:00
问题 I trying to understand linear regression... here is script that I tried to understand: ''' A linear regression learning algorithm example using TensorFlow library. Author: Aymeric Damien Project: https://github.com/aymericdamien/TensorFlow-Examples/ ''' from __future__ import print_function import tensorflow as tf from numpy import * import numpy import matplotlib.pyplot as plt rng = numpy.random # Parameters learning_rate = 0.0001 training_epochs = 1000 display_step = 50 # Training Data

Rolling regression over multiple columns

拥有回忆 提交于 2019-12-09 11:57:48
问题 I have an issue finding the most efficient way to calculate a rolling linear regression over a xts object with multiple columns. I have searched and read several previously questions here on stackoverflow. This question and answer comes close but not enough in my opinion as I want to calculate multiple regressions with the dependent variable unchanged in all the regressions. I have tried to reproduce an example with random data: require(xts) require(RcppArmadillo) # Load libraries data <-