Syntax while setting schema for Pyspark.sql using StructType

隐身守侯 提交于 2020-07-31 07:09:21

问题


I am new to spark and was playing around with Pyspark.sql. According to the pyspark.sql documentation here, one can go about setting the Spark dataframe and schema like this:

spark= SparkSession.builder.getOrCreate()
from pyspark.sql.types import StringType, IntegerType, 
StructType, StructField

rdd = sc.textFile('./some csv_to_play_around.csv'

schema = StructType([StructField('Name', StringType(), True),
                     StructField('DateTime', TimestampType(), True)
                     StructField('Age', IntegerType(), True)])

# create dataframe
df3 = sqlContext.createDataFrame(rdd, schema)

My question is, what does the True stand for in the schema list above? I can't seem to find it in the documentation. Thanks in advance


回答1:


It means if the column allows null values, true for nullable, and false for not nullable

StructField(name, dataType, nullable): Represents a field in a StructType. The name of a field is indicated by name. The data type of a field is indicated by dataType. nullable is used to indicate if values of this fields can have null values.

Refer to Spark SQL and DataFrame Guide for more informations.



来源:https://stackoverflow.com/questions/30214373/syntax-while-setting-schema-for-pyspark-sql-using-structtype

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!