Scala & Spark: Cast multiple columns at once

后端 未结 4 886
小鲜肉
小鲜肉 2020-12-30 15:05

Since the VectorAssembler is crashing, if a passed column has any other type than NumericType or BooleanType and I\'m dealing with a lot of T

4条回答
  •  Happy的楠姐
    2020-12-30 15:27

    FastDf = (spark.read.csv("Something.csv", header = False, mode="DRPOPFORMED"))
    FastDf.OldTypes = [feald.dataType for feald in FastDf.schema.fields]
    FastDf.NewTypes = [StringType(), FloatType(), FloatType(), IntegerType()]
    FastDf.OldColnames = FastDf.columns
    FastDf.NewColnames = ['S_tring', 'F_loat', 'F_loat2', 'I_nteger']
    FastDfSchema = FastDf.select(*
                                 (FastDf[colnumber]
                                  .cast(FastDf.NewTypes[colnumber])
                                  .alias(FastDf.NewColnames[colnumber]) 
                                      for colnumber in range(len(FastDf.NewTypes)
                                                    )
                                 )
                                )
    

    I know it is in pyspark but the logic might be handy.

提交回复
热议问题