regression | 易学教程

Scatter plot kernel smoothing: ksmooth() does not smooth my data at all

阅读更多关于 Scatter plot kernel smoothing: ksmooth() does not smooth my data at all

问题 Original question I want to smooth my explanatory variable, something like Speed data of a vehicle, and then use this smoothed values. I searched a lot, and find nothing that directly is my answer. I know how to calculate the kernel density estimation ( density() or KernSmooth::bkde() ) but I don't know then how to calculate the smoothed values of speed. Re-edited question Thanks to @ZheyuanLi, I am able to better explain what I have and what I want to do. So I have re-edited my question as

Scatter plot kernel smoothing: ksmooth() does not smooth my data at all

阅读更多关于 Scatter plot kernel smoothing: ksmooth() does not smooth my data at all

Multi-level regression model on multiply imputed data set in R (Amelia, zelig, lme4)

阅读更多关于 Multi-level regression model on multiply imputed data set in R (Amelia, zelig, lme4)

问题 I am trying to run a multi-level model on multiply imputed data (created with Amelia); the sample is based on a clustered sample with group = 24, N= 150. library("ZeligMultilevel") ML.model.0 <- zelig(dv~1 + tag(1|group), model="ls.mixed", data=a.out$imputations) summary(ML.model.0) This code produces the following error code: Error in object[[1]]$result$call : $ operator not defined for this S4 class If I run a OLS regression, it works: model.0 <- zelig(dv~1, model="ls", data=a.out

How to extract equation from a polynomial fit?

阅读更多关于 How to extract equation from a polynomial fit?

问题 My goal is to fit some data to a polynomial function and obtain the actual equation including the fitted parameter values. I adapted this example to my data and the outcome is as expected. Here is my code: import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import Ridge from sklearn.preprocessing import PolynomialFeatures from sklearn.pipeline import make_pipeline x = np.array([0., 4., 9., 12., 16., 20., 24., 27.]) y = np.array([2.9,4.3,66.7,91.4,109.2,114.8,135.5,134

Python: LightGBM cross validation. How to use lightgbm.cv for regression?

阅读更多关于 Python: LightGBM cross validation. How to use lightgbm.cv for regression?

问题 I want to do a cross validation for LightGBM model with lgb.Dataset and use early_stopping_rounds . The following approach works without a problem with XGBoost's xgboost.cv . I prefer not to use Scikit Learn's approach with GridSearchCV, because it doesn't support early stopping or lgb.Dataset. import lightgbm as lgb from sklearn.metrics import mean_absolute_error dftrainLGB = lgb.Dataset(data = dftrain, label = ytrain, feature_name = list(dftrain)) params = {'objective': 'regression'} cv

for loops for regression over multiple variables & outputting a subset

阅读更多关于 for loops for regression over multiple variables & outputting a subset

问题 I have tried to apply this QA: "efficient looping logistic regression in R" to my own problem but I cannot quite make it work. I haven't tried to use apply, but I was told by a few people that a for loop is the best here (if someone believes otherwise please feel free to explain!) I think this problem is pretty generalizeable and not too esoteric for the forum. This is what I want to achieve: I have a dataset with 3 predictor variables (gender, age, race) and a dependent variable (a

Subsetting in dredge (MuMIn) - must include interaction if main effects are present

阅读更多关于 Subsetting in dredge (MuMIn) - must include interaction if main effects are present

问题 I'm doing some exploratory work where I use dredge{MuMIn}. In this procedure there are two variables that I want to set to be allowed together ONLY when the interaction between them is present, i.e. they can not be present together only as main effects. Using sample data: I want to dredge the model fm1 (disregarding that it probably doesn't make sense). If the variables GNP and Population appear together, they must also include the interaction between them. require(stats); require(graphics) #

Logistic regression returns error but runs okay on reduced dataset

阅读更多关于 Logistic regression returns error but runs okay on reduced dataset

问题 I would appreciate your input on this a lot! I am working on a logistic regression, but it is not working for some reason: mod1<-glm(survive~reLDM2+yr+yr2+reLDM2:yr +reLDM2:yr2+NestAge0, family=binomial(link=logexp(NSSH1$exposure)), data=NSSH1, control = list(maxit = 50)) When I run the same model with less data it works! But with the complete dataset I get an error and warning messages: Error: inner loop 1; cannot correct step size In addition: Warning messages: 1: step size truncated due to

Unexpected standard errors with weighted least squares in Python Pandas

阅读更多关于 Unexpected standard errors with weighted least squares in Python Pandas

问题 In the code for the main OLS class in Python Pandas, I am looking for help to clarify what conventions are used for the standard error and t-stats reported when weighted OLS is performed. Here's my example data set, with some imports to use Pandas and to use scikits.statsmodels WLS directly: import pandas import numpy as np from statsmodels.regression.linear_model import WLS # Make some random data. np.random.seed(42) df = pd.DataFrame(np.random.randn(10, 3), columns=['a', 'b', 'weights']) #

Unexpected standard errors with weighted least squares in Python Pandas

阅读更多关于 Unexpected standard errors with weighted least squares in Python Pandas