How to use L1 penalty in pyspark.ml.regression.LinearRegressionModel for features selection?

橙三吉。 提交于 2019-12-08 10:21:01

问题


Firstly, I use spark 1.6.0. I want to use L1 penalty in pyspark.ml.regression.LinearRegressionModel for features selection.

But I can not get the detailed coefficients when calling the function:

lr = LogisticRegression(elasticNetParam=1.0, regParam=0.01,maxIter=100,fitIntercept=False,standardization=False)
model = lr.fit(df_one_hot_train)
print model.coefficients.toArray().astype(float).tolist() 

I only get sparse list like:

[0,0,0,0,0,..,-0.0871650387514,..,]

While when I use sklearn.linear_model.LogisticRegression model, I can get the detailed list without zero value in coef_ list like:

[0.03098372361467529,-0.13709075166114365,-0.15069548597557908,-0.017968044053830862]

With the better performance in spark, I could finished my work faster. I just want to use L1 penalty for feature selection.

I think I should use more detailed values of coefficients for my feature selection work just as sklearn does, how can I solve my problem?


回答1:


Below is a working code snip in Spark 2.1.

The key to extract values is :

stages(4).asInstanceOf[LinearRegressionModel]

Spark 1.6 may have something similar.

val holIndIndexer = new StringIndexer().setInputCol("holInd").setOutputCol("holIndIndexer")

val holIndEncoder = new OneHotEncoder().setInputCol("holIndIndexer").setOutputCol("holIndVec")

val time_intervaLEncoder = new OneHotEncoder().setInputCol("time_interval").setOutputCol("time_intervaLVec")

val assemblerL1 = (new VectorAssembler()
           .setInputCols(Array("time_intervaLVec", "holIndVec", "length")).setOutputCol("features") )

val lrL1 = new LinearRegression().setFeaturesCol("features").setLabelCol("travel_time")

val pipelineL1 = new Pipeline().setStages(Array(holIndIndexer,holIndEncoder,time_intervaLEncoder,assemblerL1, lrL1))

val modelL1 = pipelineL1.fit(dfTimeMlFull)

val l1Coeff =modelL1.stages(4).asInstanceOf[LinearRegressionModel].coefficients

println(l1Coeff)


来源:https://stackoverflow.com/questions/41235744/how-to-use-l1-penalty-in-pyspark-ml-regression-linearregressionmodel-for-feature

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!