Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects

后端 未结 9 1796
悲&欢浪女
悲&欢浪女 2020-11-22 05:29

Getting strange behavior when calling function outside of a closure:

  • when function is in a object everything is working
  • when function is in a class ge
9条回答
  •  星月不相逢
    2020-11-22 05:55

    def upper(name: String) : String = { 
    var uppper : String  =  name.toUpperCase()
    uppper
    }
    
    val toUpperName = udf {(EmpName: String) => upper(EmpName)}
    val emp_details = """[{"id": "1","name": "James Butt","country": "USA"},
    {"id": "2", "name": "Josephine Darakjy","country": "USA"},
    {"id": "3", "name": "Art Venere","country": "USA"},
    {"id": "4", "name": "Lenna Paprocki","country": "USA"},
    {"id": "5", "name": "Donette Foller","country": "USA"},
    {"id": "6", "name": "Leota Dilliard","country": "USA"}]"""
    
    val df_emp = spark.read.json(Seq(emp_details).toDS())
    val df_name=df_emp.select($"id",$"name")
    val df_upperName= df_name.withColumn("name",toUpperName($"name")).filter("id='5'")
    display(df_upperName)
    

    this will give error org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)

    Solution -

    import java.io.Serializable;
    
    object obj_upper extends Serializable { 
      def upper(name: String) : String = 
      {
        var uppper : String  =  name.toUpperCase()
        uppper
      }
    val toUpperName = udf {(EmpName: String) => upper(EmpName)}
    }
    
    val df_upperName= 
    df_name.withColumn("name",obj_upper.toUpperName($"name")).filter("id='5'")
    display(df_upperName)
    

提交回复
热议问题