E-num / get Dummies in pyspark

后端 未结 4 919
野的像风
野的像风 2020-12-18 08:58

I would like to create a function in PYSPARK that get Dataframe and list of parameters (codes/categorical features) and return the data frame with additiona

4条回答
  •  情话喂你
    2020-12-18 09:20

    I was looking for the same solution but is scala, maybe this will help someone:

    val list = df.select("category").distinct().rdd.map(r => r(0)).collect()
    val oneHotDf = list.foldLeft(df)((df, category) => finalDf.withColumn("category_" + category, when(col("category") === category, 1).otherwise(0)))
    

提交回复
热议问题