Specify multiple columns data type changes to different data types in pyspark

Deadly 提交于 2019-12-01 01:46:26

Instead of enumerating all of your values, you should use a loop:

for c in timestamp_type:
    df3 = df3.withColumn(c, df[c].cast(TimestampType()))

for c in integer_type:
    df3 = df3.withColumn(c, df[c].cast(IntegerType()))

Or equivalently, you can use functools.reduce:

from functools import reduce   # not needed in python 2
df3 = reduce(
    lambda df, c: df.withColumn(c, df[c].cast(TimestampType())), 
    timestamp_type,
    df3
)

df3 = reduce(
    lambda df, c: df.withColumn(c, df[c].cast(IntegerType())),
    integer_type,
    df3
)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!