regression | 易学教程

Observation deleted due to missingness in R

阅读更多关于 Observation deleted due to missingness in R

问题 I am busy with a regression model in R and i have about 16 000 observations. One of these observations causes me to get the following error message, (1 observation deleted due to missingness) Is there a way in R so that i can identify this one observation? 回答1: If your data is in a data.frame x , and each row corresponds to an observation, then the way to go about this is to identify complete cases via complete.cases(x) . Conversely, to find missing values in an observation, do ! complete

How to create an ARFF file from an array in java?

阅读更多关于 How to create an ARFF file from an array in java?

问题 I want to get the coefficients of a weighted linear regression of an x-y pair represented by two arrays in java. I have zeroed in on weka, but it is asking an 'Instances' class object in the 'LinearRegression' class. To create an 'Instances' class file, an ARFF file is needed which contains the data. I have come across solutions that use the FastVector class but that has now been deprecated in the latest weka version. How do I create an ARFF file for the x-y pair and the corresponding weights

Make regressions and predictions for groups in R

阅读更多关于 Make regressions and predictions for groups in R

问题 I have the following data.frame d from an experiment: - Variable y (response, continuous) - Factor f (500 levels) - Time t (posixct) In the last 8 years, y was measured roughly once a month (exact date in t) for each level of f. Sometimes there are 2 measures per month, sometimes a couple of month passed without any measures. Sorry for not providing example data, but making up unregular time series goes beyond my R knowledge. ;) I'd like to do the following with this data: make a regression

How to combine two seaborn plots?

阅读更多关于 How to combine two seaborn plots?

问题 From the seaborn docs, the following snippet will produce the plot below: import numpy as np import pandas as pd import seaborn as sns sns.set(style="white") # Generate a random correlated bivariate dataset rs = np.random.RandomState(5) mean = [0, 0] cov = [(1, .5), (.5, 1)] x1, x2 = rs.multivariate_normal(mean, cov, 500).T x1 = pd.Series(x1, name="$X_1$") x2 = pd.Series(x2, name="$X_2$") # Show the joint distribution using kernel density estimation g = sns.jointplot(x1, x2, kind="kde", size

Plotting a 95% confidence interval for a lm object

阅读更多关于 Plotting a 95% confidence interval for a lm object

问题 How can I calculate and plot a confidence interval for my regression in r? So far I have two numerical vectors of equal length (x,y) and a regression object(lm.out). I have made a scatterplot of y given x and added the regression line to this plot. I am looking for a way to add a 95% prediction confidence band for lm.out to the plot. I've tried using the predict function, but I don't even know where to start with that :/. Here is my code at the moment: x=c(1,2,3,4,5,6,7,8,9,0) y=c(13,28,43,35

R: plm — year fixed effects — year and quarter data

阅读更多关于 R: plm — year fixed effects — year and quarter data

问题 I am having a problem setting up a panel data model. Here is some sample data: library(plm) id <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2) year <- c(1999,1999,1999,1999,2000,2000,2000,2000,1999,1999,1999,1999,2000,2000,2000,2000) qtr <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4) y <- rnorm(16, mean=0, sd=1) x <- rnorm(16, mean=0, sd=1) data <- data.frame(id=id,year=year,qtr=qtr,y_q=paste(year,qtr,sep="_"),y=y,x=x) I run the following regression using 'id' as the individual index and 'year' as the time

Cubic spline method for longitudinal series data?

阅读更多关于 Cubic spline method for longitudinal series data?

问题 I have a serial data formatted as follows: time milk Animal_ID 30 25.6 1 31 27.2 1 32 24.4 1 33 17.4 1 34 33.6 1 35 25.4 1 33 29.4 2 34 25.4 2 35 24.7 2 36 27.4 2 37 22.4 2 80 24.6 3 81 24.5 3 82 23.5 3 83 25.5 3 84 24.4 3 85 23.4 3 . . . Generally, 300 animals have records of milk in different time points of short period. However, if we join their data together and do not care about different animal_ID, we would have a curve between milk~time like this, the line in figure below: Also, in the

Gaussian Process scikit-learn - Exception

阅读更多关于 Gaussian Process scikit-learn - Exception

问题 I want to use Gaussian Processes to solve a regression task. My data is as follow : each X vector has a length of 37, and each Y vector has a length of 8. I'm using the sklearn package in Python but trying to use gaussian processes leads to an Exception : from sklearn import gaussian_process print "x :", x__ print "y :", y__ gp = gaussian_process.GaussianProcess(theta0=1e-2, thetaL=1e-4, thetaU=1e-1) gp.fit(x__, y__) x : [[ 136. 137. 137. 132. 130. 130. 132. 133. 134. 135. 135. 134. 134. 1139

Plot logistic regression curve in R

阅读更多关于 Plot logistic regression curve in R

问题 I want to plot a logistic regression curve of my data, but whenever I try to my plot produces multiple curves. Here's a picture of my last attempt: last attempt Here's the relevant code I am using: fit = glm(output ~ maxhr, data=heart, family=binomial) predicted = predict(fit, newdata=heart, type="response") plot(output~maxhr, data=heart, col="red4") lines(heart$maxhr, predicted, col="green4", lwd=2) My professor uses the following code, but when I try to run it I get an error on the last

PyMC regression of many regressions?

阅读更多关于 PyMC regression of many regressions?

问题 I haven't been using PyMC for long, but I was pleased at how quickly I was able to get a linear regression off the ground (this code should run without modification in IPython): import pandas as pd from numpy import * import pymc data=pd.DataFrame(rand(40)) predictors=pd.DataFrame(rand(40,5)) sigma = pymc.Uniform('sigma', 0.0, 200.0, value=20) params= array([pymc.Normal('%s_coef' % (c), mu=0, tau=1e-3,value=0) for c in predictors.columns]) @pymc.deterministic(plot=False) def linear_regression