regression

Error in `contrasts' Error

本小妞迷上赌 提交于 2019-12-04 04:43:00
问题 I have trained a model and I am attempting to use the predict function but it returns the following error. Error in contrasts<- ( *tmp* , value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels There are several questions in SO and CrossValidated about this, and from what I interpret this error to be, is one factor in my model has only one level. This is a pretty simple model, with one continuous variable (driveTime) and one factor variable which has

Robust se. (vcovHC) to be shown with texreg in R

為{幸葍}努か 提交于 2019-12-04 04:29:54
问题 I am doing some regressions with the plm package, then if needed, I also obtain heteroskedasticity consistent coefficients. Below are the commands that I run; library(plm) data("Produc", package = "plm") zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, index = c("state","year")) summary(zz) coeftest(zz, vcovHC) My problem starts here. Below is the list of commands to obtain a Latex output with the help of the texreg. How can I integrate the result obtained with the

R: plm — year fixed effects — year and quarter data

守給你的承諾、 提交于 2019-12-04 04:08:28
I am having a problem setting up a panel data model. Here is some sample data: library(plm) id <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2) year <- c(1999,1999,1999,1999,2000,2000,2000,2000,1999,1999,1999,1999,2000,2000,2000,2000) qtr <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4) y <- rnorm(16, mean=0, sd=1) x <- rnorm(16, mean=0, sd=1) data <- data.frame(id=id,year=year,qtr=qtr,y_q=paste(year,qtr,sep="_"),y=y,x=x) I run the following regression using 'id' as the individual index and 'year' as the time index: reg1 <- plm(y ~ x, data=data,index=c("id", "year"), model="within",effect="time") Unfortunately, I

interpreting Graphviz output for decision tree regression

99封情书 提交于 2019-12-04 04:06:43
I'm curious what the value field is in the nodes of the decision tree produced by Graphviz when used for regression. I understand that this is the number of samples in each class that are separated by a split when using decision tree classification but I'm not sure what it means for regression. My data has a 2 dimensional input and a 10 dimensional output. Here is an example of what a tree looks like for my regression problem: produced using this code & visualized with webgraphviz # X = (n x 2) Y = (n x 10) X_test = (m x 2) input_scaler = pickle.load(open("../input_scaler.sav","rb")) reg =

How to find multivariable regression equation in javascript

余生颓废 提交于 2019-12-04 04:01:32
I have searched stack overflow and have not found any question that really is the same as mine because none really have more than one independent variable. Basically I have an array of datapoints and I want to be able to find a regression equation for those data points. The code I have so far looks like this: (w,x,z are the independent variables and y is the dependent variable) var dataPoints = [{ "w" : 1, "x" : 2, "z" : 1, "y" : 7 }, { "w" : 2, "x" : 1, "z" : 4, "y" : 5 }, { "w" : 1, "x" : 5, "z" : 3, "y" : 2 }, { "w" : 4, "x" : 3, "z" : 5, "y" : 15 }]; I would like a function that would

Using categorical data as features in sklean LogisticRegression

大兔子大兔子 提交于 2019-12-04 03:05:12
I'm trying to understand how to use categorical data as features in sklearn.linear_model 's LogisticRegression . I understand of course I need to encode it. What I don't understand is how to pass the encoded feature to the Logistic regression so it's processed as a categorical feature, and not interpreting the int value it got when encoding as a standard quantifiable feature. (Less important) Can somebody explain the difference between using preprocessing.LabelEncoder() , DictVectorizer.vocabulary or just encoding the categorical data yourself with a simple dict? Alex A.'s comment here touches

Ignoring missing values in multiple OLS regression with statsmodels

浪子不回头ぞ 提交于 2019-12-04 02:16:22
I'm trying to run a multiple OLS regression using statsmodels and a pandas dataframe. There are missing values in different columns for different rows, and I keep getting the error message: ValueError: array must not contain infs or NaNs I saw this SO question, which is similar but doesn't exactly answer my question: statsmodel.api.Logit: valueerror array must not contain infs or nans What I would like to do is run the regression and ignore all rows where there are missing variables for the variables I am using in this regression. Right now I have: import pandas as pd import numpy as np import

Problems displaying LOESS regression line and confidence interval

给你一囗甜甜゛ 提交于 2019-12-04 02:13:51
问题 I am having some issues trying to compete a LOESS regression with a data set. I have been able to properly create the line, but I am unable to get it to plot correctly. I ran through the data like this. animals.lo <- loess(X15p5 ~ Period, animals, weights = n.15p5) animals.lo summary(animals.lo) plot(X15p5~ Period, animals) lines(animals$X15p5, animals.lo, col="red") At this point I received an error "Error in xy.coords(x, y) : 'x' and 'y' lengths differ" I searched around and read that this

`nls` fitting error: always reach maximum number of iterations regardless starting values

亡梦爱人 提交于 2019-12-04 01:55:06
问题 Using this parametrization for a growth curve logistic model I created some points with: K =0.7 ; y0=0.01 ; r =0.3 df = data.frame(x= seq(1, 50, by = 5)) df$y = 0.7/(1+((0.7-0.01)/0.01)*exp(-0.3*df$x)) Can someone tell me how can I have a fitting error if create the data with the model starters? fo = df$y ~ K/(1+((K-y0)/y0)*exp(-r*df$x)) model<-nls(fo, start = list(K=0.7, y0=0.01, r=0.3), df, nls.control(maxiter = 1000)) Error in nls(fo, start = list(K = 0.7, y0 = 0.01, r = 0.3), df, nls

Non linear Regression: Why isn't the model learning?

为君一笑 提交于 2019-12-04 01:40:42
问题 I just started learning keras. I am trying to train a non-linear regression model in keras but model doesn't seem to learn much. #datapoints X = np.arange(0.0, 5.0, 0.1, dtype='float32').reshape(-1,1) y = 5 * np.power(X,2) + np.power(np.random.randn(50).reshape(-1,1),3) #model model = Sequential() model.add(Dense(50, activation='relu', input_dim=1)) model.add(Dense(30, activation='relu', init='uniform')) model.add(Dense(output_dim=1, activation='linear')) #training sgd = SGD(lr=0.1); model