I have dataframe in pyspark. Some of its numerical columns contain \'nan\' so when I am reading the data and checking for the schema of dataframe, those columns will have \'
from pyspark.sql.types import IntegerType data_df = data_df.withColumn("Plays", data_df["Plays"].cast(IntegerType())) data_df = data_df.withColumn("drafts", data_df["drafts"].cast(IntegerType()))
You can run loop for each column but this is the simplest way to convert string column into integer.