Pyspark : Change nested column datatype
问题 How can we change the datatype of a nested column in Pyspark? For rxample, how can I change the data type of value from string to int? Reference:how to change a Dataframe column from String type to Double type in pyspark { "x": "12", "y": { "p": { "name": "abc", "value": "10" }, "q": { "name": "pqr", "value": "20" } } } 回答1: You can read the json data using from pyspark import SQLContext sqlContext = SQLContext(sc) data_df = sqlContext.read.json("data.json", multiLine = True) data_df