Infinite loop when replacing concrete value by parameter name

99封情书 提交于 2019-12-13 09:22:24

问题


I have the two following objects (in scala and using spark): 1. The main object

object Omain {
  def main(args: Array[String]) {
    odbscan
  }
}

2. The object odbscan

object odbscan {
  val conf = new SparkConf().setAppName("Clustering").setMaster("local")
  conf.set("spark.driver.maxResultSize", "3g")
  val sc = new SparkContext(conf)

  val param_user_minimal_rating_count = 2

  /***Connexion***/
  val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
  val sql = "SELECT id, data FROM user_profile"
  val options = connectMysql.getOptionsMap(sql)
  val uSQL = sqlcontext.load("jdbc", options)

  val users = uSQL.rdd.map { x =>
    val v = x.toString().substring(1, x.toString().size - 1).split(",")
    var ap: Map[Int, Double] = Map()
    if (v.size > 1)
       ap = v(1).split(";").map { y => (y.split(":")(0).toInt, y.split(":")(1).toDouble) }.toMap
    (v(0).toInt, ap)
  }.filter(_._2.size >= param_user_minimal_rating_count) 
  println(users.collect().mkString("\n"))
}

When I execute this code I obtain an infinite loop, until I change:

filter(_._2.size >= param_user_minimal_rating_count)

to

filter(_._2.size >= 1)

or any other numerical value, in this case the code work, and I have my result displayed


回答1:


What I think is happening here is that Spark serializes functions to send them over the wire. And that because your function (the one you're passing to map) calls the accessor param_user_minimal_rating_count of object odbscan, the entire object odbscan will need to get serialized and sent along with it. Deserializing and then using that deserialized object will cause the code in its body to get executed again which causes an infinite loop of serializing-->sending-->deserializing-->executing-->serializing-->...

Probably the easiest thing to do here is changing that val to final val param_user_minimal_rating_count = 2 so the compiler will inline the value. But note that this will only be a solution for literal constants. For more information see constant value definitions and constant expressions.

An other and better solution would be to refactor your code so that no instance variables are used in lambda expressions. Referencing vals that are defined in an object or class will get the whole object serialized. So try to only refer to vals that are local (to a method). And most importantly don't execute your business logic from within a constructor/the body of an object or class.




回答2:


Your problem is somewhere else.

The only difference between the 2 snippets is the definition of val Eps = 5 outside of the map which does not change at all the control flow of your code.

Please post more context so we can help.



来源:https://stackoverflow.com/questions/41983586/infinite-loop-when-replacing-concrete-value-by-parameter-name

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!