Calculating Standard Error of Coefficients for Logistic Regression in Spark

亡梦爱人 提交于 2019-12-22 18:19:01

问题


I know this question has been asked previously here. But I couldn't find the correct answer. The answer provided in the previous post suggests the usage of Statistics.chiSqTest(data) which provides the goodness of fit test (Pearson's Chi-Square tests), not the Wald Chi-Square tests for significance of coefficients.

I was trying to build the parameter estimate table for logistic regression in Spark. I was able to get the coefficients and intercepts, but I couldn't find the spark API to get the standard error for the coefficients. I see that the coefficient standard errors are available in the linear model as part of the model summary. But Logistic regression model summary doesn't provide this. Part of the sample code is as follows.

import org.apache.spark.ml.classification.{BinaryLogisticRegressionSummary, LogisticRegression}

val lr = new LogisticRegression()
  .setMaxIter(10)
  .setRegParam(0.3)
  .setElasticNetParam(0.8)

// Fit the model
val lrModel = lr.fit(training) // Assuming training is my training dataset

val trainingSummary = lrModel.summary
val binarySummary = trainingSummary.asInstanceOf[BinaryLogisticRegressionSummary] // provides the summary information of the fitted model

Is there any way of calculating the standard error for coefficients. (or getting the variance-covariance matrix for coefficients, from which we can get the standard error)


回答1:


You need to use the GLM method with Binomial+Logit instead of LogisticRegression.

https://spark.apache.org/docs/2.1.1/ml-classification-regression.html#generalized-linear-regression



来源:https://stackoverflow.com/questions/48482245/calculating-standard-error-of-coefficients-for-logistic-regression-in-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!