regression

Advice on calculating a function to describe upper bound of data

痞子三分冷 提交于 2019-12-21 19:12:16
问题 I have a scatter plot of a dataset and I am interested in calculating the upper bound of the data. I don't know if this is a standard statistical approach so what I was considering doing was splitting the X-axis data into small ranges, calculating the max for these ranges and then trying to identify a function to describe these points. Is there a function already in R to do this? If it's relevant there are 92611 points. 回答1: You might like to look into quantile regression, which is available

Advice on calculating a function to describe upper bound of data

若如初见. 提交于 2019-12-21 19:11:25
问题 I have a scatter plot of a dataset and I am interested in calculating the upper bound of the data. I don't know if this is a standard statistical approach so what I was considering doing was splitting the X-axis data into small ranges, calculating the max for these ranges and then trying to identify a function to describe these points. Is there a function already in R to do this? If it's relevant there are 92611 points. 回答1: You might like to look into quantile regression, which is available

c# LOESS/LOWESS regression

你说的曾经没有我的故事 提交于 2019-12-21 17:56:11
问题 Do you know of a .net library to perform a LOESS/LOWESS regression? (preferably free/open source) 回答1: Port from java to c# public class LoessInterpolator { public static double DEFAULT_BANDWIDTH = 0.3; public static int DEFAULT_ROBUSTNESS_ITERS = 2; /** * The bandwidth parameter: when computing the loess fit at * a particular point, this fraction of source points closest * to the current point is taken into account for computing * a least-squares regression. * * A sensible value is usually 0

linearRegression() returns list within list (sklearn)

守給你的承諾、 提交于 2019-12-21 17:38:46
问题 I'm doing multivariate linear regression in Python (sklearn), but for some reason, the coefficients are not correctly returned as a list. Instead, a list IN A LIST is returned: from sklearn import linear_model clf = linear_model.LinearRegression() # clf.fit ([[0, 0, 0], [1, 1, 1], [2, 2, 2]], [0, 1, 2]) clf.fit([[394, 3878, 13, 4, 0, 0],[384, 10175, 14, 4, 0, 0]],[3,9]) print 'coef array',clf.coef_ print 'length', len(clf.coef_) print 'getting value 0:', clf.coef_[0] print 'getting value 1:',

How to estimate goodness-of-fit using scipy.odr?

删除回忆录丶 提交于 2019-12-21 13:01:14
问题 I am fitting data with weights using scipy.odr but I don't know how to obtain a measure of goodness-of-fit or an R squared. Does anyone have suggestions for how to obtain this measure using the output stored by the function? 回答1: The res_var attribute of the Output is the so-called reduced Chi-square value for the fit, a popular choice of goodness-of-fit statistic. It is somewhat problematic for non-linear fitting, though. You can look at the residuals directly ( out.delta for the X residuals

interpreting Graphviz output for decision tree regression

走远了吗. 提交于 2019-12-21 11:03:16
问题 I'm curious what the value field is in the nodes of the decision tree produced by Graphviz when used for regression. I understand that this is the number of samples in each class that are separated by a split when using decision tree classification but I'm not sure what it means for regression. My data has a 2 dimensional input and a 10 dimensional output. Here is an example of what a tree looks like for my regression problem: produced using this code & visualized with webgraphviz # X = (n x

Ignoring missing values in multiple OLS regression with statsmodels

你说的曾经没有我的故事 提交于 2019-12-21 07:30:14
问题 I'm trying to run a multiple OLS regression using statsmodels and a pandas dataframe. There are missing values in different columns for different rows, and I keep getting the error message: ValueError: array must not contain infs or NaNs I saw this SO question, which is similar but doesn't exactly answer my question: statsmodel.api.Logit: valueerror array must not contain infs or nans What I would like to do is run the regression and ignore all rows where there are missing variables for the

Equations for 2 variable Linear Regression

╄→尐↘猪︶ㄣ 提交于 2019-12-21 06:07:50
问题 We are using a programming language that does not have a linear regression function in it. We have already implemented a single variable linear equation: y = Ax + B and have simply calculated the A and B coefficents from the data using a solution similar to this Stack Overflow answer. I know this problem gets geometrically harder as variables are added, but for our purposes, we only need to add one more: z = Ax + By + C Does anyone have the closed form equations, or code in any language that

Loss suddenly increases with Adam Optimizer in Tensorflow

旧巷老猫 提交于 2019-12-21 03:44:50
问题 I am using a CNN for a regression task. I use Tensorflow and the optimizer is Adam. The network seems to converge perfectly fine till one point where the loss suddenly increases along with the validation error. Here are the loss plots of the labels and the weights separated (Optimizer is run on the sum of them) I use l2 loss for weight regularization and also for the labels. I apply some randomness on the training data. I am currently trying RSMProp to see if the behavior changes but it takes

Iteratively forecasting dyn models

五迷三道 提交于 2019-12-21 03:10:11
问题 I've written a function to iteratively forecast models built using the package dyn, and I'd like some feedback on it. Is there a better way to do this? Has someone written canonical "forecast" methods for the dyn class (or dynlm class), or am I venturing into uncharted territory here? ipredict <-function(model, newdata, interval = "none", level = 0.95, na.action = na.pass, weights = 1) { P<-predict(model,newdata=newdata,interval=interval, level=level,na.action=na.action,weights=weights) for