Spark: “Truncated the string representation of a plan since it was too large.” Warning when using manually created aggregation expression

后端 未结 2 1136
-上瘾入骨i
-上瘾入骨i 2020-12-23 10:00

I am trying to build for each of my users a vector containing the average number of records per hour of day. Hence the vector has to have 24 dimensions.

My original

相关标签:
2条回答
  • 2020-12-23 10:13

    You can safely ignore it, if you are not interested in seeing the sql schema logs. Otherwise, you might want to set the property to a higher value, but it might affect the performance of your job:

    spark.debug.maxToStringFields=100
    

    Default value is: DEFAULT_MAX_TO_STRING_FIELDS = 25

    The performance overhead of creating and logging strings for wide schemas can be large. To limit the impact, we bound the number of fields to include by default. This can be overridden by setting the 'spark.debug.maxToStringFields' conf in SparkEnv.

    Taken from: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L90

    0 讨论(0)
  • 2020-12-23 10:20

    This config, along many others, has been moved to: SQLConf - sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

    This can be set either in the config file or via command line in spark, using:

    spark.conf.set("spark.sql.debug.maxToStringFields", 1000)
    
    0 讨论(0)
提交回复
热议问题