Since the VectorAssembler is crashing, if a passed column has any other type than NumericType or BooleanType and I\'m dealing with a lot of T
FastDf = (spark.read.csv("Something.csv", header = False, mode="DRPOPFORMED"))
FastDf.OldTypes = [feald.dataType for feald in FastDf.schema.fields]
FastDf.NewTypes = [StringType(), FloatType(), FloatType(), IntegerType()]
FastDf.OldColnames = FastDf.columns
FastDf.NewColnames = ['S_tring', 'F_loat', 'F_loat2', 'I_nteger']
FastDfSchema = FastDf.select(*
(FastDf[colnumber]
.cast(FastDf.NewTypes[colnumber])
.alias(FastDf.NewColnames[colnumber])
for colnumber in range(len(FastDf.NewTypes)
)
)
)
I know it is in pyspark but the logic might be handy.