NullPointerException in Spark RDD map when submitted as a spark job

前端 未结 1 493
别那么骄傲
别那么骄傲 2020-12-21 05:44

We\'re trying to submit a spark job (spark 2.0, hadoop 2.7.2) but for some reason we\'re receiving a rather cryptic NPE in EMR. Everything runs just fine as a scala program

相关标签:
1条回答
  • 2020-12-21 06:20

    I think that you get a NullPointerException thrown by the worker when it tries to access a SparkContext object that's only present on the driver and not the workers.

    coalesce() repartitions your data. When you request one partition only, it will try to squeeze all the data in one partition*. That may put much pressure on the memory footpring of your application.

    In general, it is a good idea not to shrink your partitions in only 1.

    For more, read this: Spark NullPointerException with saveAsTextFile and this.


    • In case you are not sure what a partition is, I explained it to myself in memoryOverhead issue in Spark.
    0 讨论(0)
提交回复
热议问题