regression

Scatter plot kernel smoothing: ksmooth() does not smooth my data at all

僤鯓⒐⒋嵵緔 提交于 2019-12-22 05:30:51
问题 Original question I want to smooth my explanatory variable, something like Speed data of a vehicle, and then use this smoothed values. I searched a lot, and find nothing that directly is my answer. I know how to calculate the kernel density estimation ( density() or KernSmooth::bkde() ) but I don't know then how to calculate the smoothed values of speed. Re-edited question Thanks to @ZheyuanLi, I am able to better explain what I have and what I want to do. So I have re-edited my question as

Scatter plot kernel smoothing: ksmooth() does not smooth my data at all

廉价感情. 提交于 2019-12-22 05:30:42
问题 Original question I want to smooth my explanatory variable, something like Speed data of a vehicle, and then use this smoothed values. I searched a lot, and find nothing that directly is my answer. I know how to calculate the kernel density estimation ( density() or KernSmooth::bkde() ) but I don't know then how to calculate the smoothed values of speed. Re-edited question Thanks to @ZheyuanLi, I am able to better explain what I have and what I want to do. So I have re-edited my question as

Multi-level regression model on multiply imputed data set in R (Amelia, zelig, lme4)

不想你离开。 提交于 2019-12-22 05:25:22
问题 I am trying to run a multi-level model on multiply imputed data (created with Amelia); the sample is based on a clustered sample with group = 24, N= 150. library("ZeligMultilevel") ML.model.0 <- zelig(dv~1 + tag(1|group), model="ls.mixed", data=a.out$imputations) summary(ML.model.0) This code produces the following error code: Error in object[[1]]$result$call : $ operator not defined for this S4 class If I run a OLS regression, it works: model.0 <- zelig(dv~1, model="ls", data=a.out

How to extract equation from a polynomial fit?

心已入冬 提交于 2019-12-22 04:39:22
问题 My goal is to fit some data to a polynomial function and obtain the actual equation including the fitted parameter values. I adapted this example to my data and the outcome is as expected. Here is my code: import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import Ridge from sklearn.preprocessing import PolynomialFeatures from sklearn.pipeline import make_pipeline x = np.array([0., 4., 9., 12., 16., 20., 24., 27.]) y = np.array([2.9,4.3,66.7,91.4,109.2,114.8,135.5,134

Python: LightGBM cross validation. How to use lightgbm.cv for regression?

↘锁芯ラ 提交于 2019-12-22 03:22:45
问题 I want to do a cross validation for LightGBM model with lgb.Dataset and use early_stopping_rounds . The following approach works without a problem with XGBoost's xgboost.cv . I prefer not to use Scikit Learn's approach with GridSearchCV, because it doesn't support early stopping or lgb.Dataset. import lightgbm as lgb from sklearn.metrics import mean_absolute_error dftrainLGB = lgb.Dataset(data = dftrain, label = ytrain, feature_name = list(dftrain)) params = {'objective': 'regression'} cv

for loops for regression over multiple variables & outputting a subset

吃可爱长大的小学妹 提交于 2019-12-22 00:17:46
问题 I have tried to apply this QA: "efficient looping logistic regression in R" to my own problem but I cannot quite make it work. I haven't tried to use apply, but I was told by a few people that a for loop is the best here (if someone believes otherwise please feel free to explain!) I think this problem is pretty generalizeable and not too esoteric for the forum. This is what I want to achieve: I have a dataset with 3 predictor variables (gender, age, race) and a dependent variable (a

Subsetting in dredge (MuMIn) - must include interaction if main effects are present

爱⌒轻易说出口 提交于 2019-12-21 20:36:55
问题 I'm doing some exploratory work where I use dredge{MuMIn}. In this procedure there are two variables that I want to set to be allowed together ONLY when the interaction between them is present, i.e. they can not be present together only as main effects. Using sample data: I want to dredge the model fm1 (disregarding that it probably doesn't make sense). If the variables GNP and Population appear together, they must also include the interaction between them. require(stats); require(graphics) #

Logistic regression returns error but runs okay on reduced dataset

浪尽此生 提交于 2019-12-21 20:17:12
问题 I would appreciate your input on this a lot! I am working on a logistic regression, but it is not working for some reason: mod1<-glm(survive~reLDM2+yr+yr2+reLDM2:yr +reLDM2:yr2+NestAge0, family=binomial(link=logexp(NSSH1$exposure)), data=NSSH1, control = list(maxit = 50)) When I run the same model with less data it works! But with the complete dataset I get an error and warning messages: Error: inner loop 1; cannot correct step size In addition: Warning messages: 1: step size truncated due to

Unexpected standard errors with weighted least squares in Python Pandas

假如想象 提交于 2019-12-21 19:47:18
问题 In the code for the main OLS class in Python Pandas, I am looking for help to clarify what conventions are used for the standard error and t-stats reported when weighted OLS is performed. Here's my example data set, with some imports to use Pandas and to use scikits.statsmodels WLS directly: import pandas import numpy as np from statsmodels.regression.linear_model import WLS # Make some random data. np.random.seed(42) df = pd.DataFrame(np.random.randn(10, 3), columns=['a', 'b', 'weights']) #

Unexpected standard errors with weighted least squares in Python Pandas

淺唱寂寞╮ 提交于 2019-12-21 19:47:05
问题 In the code for the main OLS class in Python Pandas, I am looking for help to clarify what conventions are used for the standard error and t-stats reported when weighted OLS is performed. Here's my example data set, with some imports to use Pandas and to use scikits.statsmodels WLS directly: import pandas import numpy as np from statsmodels.regression.linear_model import WLS # Make some random data. np.random.seed(42) df = pd.DataFrame(np.random.randn(10, 3), columns=['a', 'b', 'weights']) #