E-num / get Dummies in pyspark

后端 未结 4 910
野的像风
野的像风 2020-12-18 08:58

I would like to create a function in PYSPARK that get Dataframe and list of parameters (codes/categorical features) and return the data frame with additiona

4条回答
  •  一生所求
    2020-12-18 09:19

    The first step is to make a DataFrame from your CSV file.

    See Get CSV to Spark dataframe ; the first answer gives a line by line example.

    Then you can add the columns. Assume you have a DataFrame object called df, and the columns are: [ID, TYPE, CODE].

    The rest van be fixed with DataFrame.withColumn() and pyspark.sql.functions.when:

    from pyspark.sql.functions import when
    
    df_with_extra_columns = df.withColumn("e_TYPE_A", when(df.TYPE == "A", 1).otherwise(0).withColumn("e_TYPE_B", when(df.TYPE == "B", 1).otherwise(0)
    

    (this adds the first two columns. you get the point.)

提交回复
热议问题