regression

Observation deleted due to missingness in R

一笑奈何 提交于 2019-12-12 09:38:09
问题 I am busy with a regression model in R and i have about 16 000 observations. One of these observations causes me to get the following error message, (1 observation deleted due to missingness) Is there a way in R so that i can identify this one observation? 回答1: If your data is in a data.frame x , and each row corresponds to an observation, then the way to go about this is to identify complete cases via complete.cases(x) . Conversely, to find missing values in an observation, do ! complete

How to create an ARFF file from an array in java?

怎甘沉沦 提交于 2019-12-12 09:20:54
问题 I want to get the coefficients of a weighted linear regression of an x-y pair represented by two arrays in java. I have zeroed in on weka, but it is asking an 'Instances' class object in the 'LinearRegression' class. To create an 'Instances' class file, an ARFF file is needed which contains the data. I have come across solutions that use the FastVector class but that has now been deprecated in the latest weka version. How do I create an ARFF file for the x-y pair and the corresponding weights

Make regressions and predictions for groups in R

懵懂的女人 提交于 2019-12-12 09:16:04
问题 I have the following data.frame d from an experiment: - Variable y (response, continuous) - Factor f (500 levels) - Time t (posixct) In the last 8 years, y was measured roughly once a month (exact date in t) for each level of f. Sometimes there are 2 measures per month, sometimes a couple of month passed without any measures. Sorry for not providing example data, but making up unregular time series goes beyond my R knowledge. ;) I'd like to do the following with this data: make a regression

How to combine two seaborn plots?

耗尽温柔 提交于 2019-12-12 09:12:10
问题 From the seaborn docs, the following snippet will produce the plot below: import numpy as np import pandas as pd import seaborn as sns sns.set(style="white") # Generate a random correlated bivariate dataset rs = np.random.RandomState(5) mean = [0, 0] cov = [(1, .5), (.5, 1)] x1, x2 = rs.multivariate_normal(mean, cov, 500).T x1 = pd.Series(x1, name="$X_1$") x2 = pd.Series(x2, name="$X_2$") # Show the joint distribution using kernel density estimation g = sns.jointplot(x1, x2, kind="kde", size

Plotting a 95% confidence interval for a lm object

青春壹個敷衍的年華 提交于 2019-12-12 09:05:13
问题 How can I calculate and plot a confidence interval for my regression in r? So far I have two numerical vectors of equal length (x,y) and a regression object(lm.out). I have made a scatterplot of y given x and added the regression line to this plot. I am looking for a way to add a 95% prediction confidence band for lm.out to the plot. I've tried using the predict function, but I don't even know where to start with that :/. Here is my code at the moment: x=c(1,2,3,4,5,6,7,8,9,0) y=c(13,28,43,35

R: plm — year fixed effects — year and quarter data

眉间皱痕 提交于 2019-12-12 08:35:28
问题 I am having a problem setting up a panel data model. Here is some sample data: library(plm) id <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2) year <- c(1999,1999,1999,1999,2000,2000,2000,2000,1999,1999,1999,1999,2000,2000,2000,2000) qtr <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4) y <- rnorm(16, mean=0, sd=1) x <- rnorm(16, mean=0, sd=1) data <- data.frame(id=id,year=year,qtr=qtr,y_q=paste(year,qtr,sep="_"),y=y,x=x) I run the following regression using 'id' as the individual index and 'year' as the time

Cubic spline method for longitudinal series data?

早过忘川 提交于 2019-12-12 08:02:15
问题 I have a serial data formatted as follows: time milk Animal_ID 30 25.6 1 31 27.2 1 32 24.4 1 33 17.4 1 34 33.6 1 35 25.4 1 33 29.4 2 34 25.4 2 35 24.7 2 36 27.4 2 37 22.4 2 80 24.6 3 81 24.5 3 82 23.5 3 83 25.5 3 84 24.4 3 85 23.4 3 . . . Generally, 300 animals have records of milk in different time points of short period. However, if we join their data together and do not care about different animal_ID, we would have a curve between milk~time like this, the line in figure below: Also, in the

Gaussian Process scikit-learn - Exception

我只是一个虾纸丫 提交于 2019-12-12 07:59:39
问题 I want to use Gaussian Processes to solve a regression task. My data is as follow : each X vector has a length of 37, and each Y vector has a length of 8. I'm using the sklearn package in Python but trying to use gaussian processes leads to an Exception : from sklearn import gaussian_process print "x :", x__ print "y :", y__ gp = gaussian_process.GaussianProcess(theta0=1e-2, thetaL=1e-4, thetaU=1e-1) gp.fit(x__, y__) x : [[ 136. 137. 137. 132. 130. 130. 132. 133. 134. 135. 135. 134. 134. 1139

Plot logistic regression curve in R

此生再无相见时 提交于 2019-12-12 07:31:53
问题 I want to plot a logistic regression curve of my data, but whenever I try to my plot produces multiple curves. Here's a picture of my last attempt: last attempt Here's the relevant code I am using: fit = glm(output ~ maxhr, data=heart, family=binomial) predicted = predict(fit, newdata=heart, type="response") plot(output~maxhr, data=heart, col="red4") lines(heart$maxhr, predicted, col="green4", lwd=2) My professor uses the following code, but when I try to run it I get an error on the last

PyMC regression of many regressions?

白昼怎懂夜的黑 提交于 2019-12-12 05:49:21
问题 I haven't been using PyMC for long, but I was pleased at how quickly I was able to get a linear regression off the ground (this code should run without modification in IPython): import pandas as pd from numpy import * import pymc data=pd.DataFrame(rand(40)) predictors=pd.DataFrame(rand(40,5)) sigma = pymc.Uniform('sigma', 0.0, 200.0, value=20) params= array([pymc.Normal('%s_coef' % (c), mu=0, tau=1e-3,value=0) for c in predictors.columns]) @pymc.deterministic(plot=False) def linear_regression