问题
I have the two following objects (in scala and using spark): 1. The main object
object Omain {
def main(args: Array[String]) {
odbscan
}
}
2. The object odbscan
object odbscan {
val conf = new SparkConf().setAppName("Clustering").setMaster("local")
conf.set("spark.driver.maxResultSize", "3g")
val sc = new SparkContext(conf)
val param_user_minimal_rating_count = 2
/***Connexion***/
val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
val sql = "SELECT id, data FROM user_profile"
val options = connectMysql.getOptionsMap(sql)
val uSQL = sqlcontext.load("jdbc", options)
val users = uSQL.rdd.map { x =>
val v = x.toString().substring(1, x.toString().size - 1).split(",")
var ap: Map[Int, Double] = Map()
if (v.size > 1)
ap = v(1).split(";").map { y => (y.split(":")(0).toInt, y.split(":")(1).toDouble) }.toMap
(v(0).toInt, ap)
}.filter(_._2.size >= param_user_minimal_rating_count)
println(users.collect().mkString("\n"))
}
When I execute this code I obtain an infinite loop, until I change:
filter(_._2.size >= param_user_minimal_rating_count)
to
filter(_._2.size >= 1)
or any other numerical value, in this case the code work, and I have my result displayed
回答1:
What I think is happening here is that Spark serializes functions to send them over the wire. And that because your function (the one you're passing to map
) calls the accessor param_user_minimal_rating_count
of object odbscan
, the entire object odbscan
will need to get serialized and sent along with it. Deserializing and then using that deserialized object will cause the code in its body to get executed again which causes an infinite loop of serializing-->sending-->deserializing-->executing-->serializing-->...
Probably the easiest thing to do here is changing that val
to final val param_user_minimal_rating_count = 2
so the compiler will inline the value. But note that this will only be a solution for literal constants. For more information see constant value definitions and constant expressions.
An other and better solution would be to refactor your code so that no instance variables are used in lambda expressions. Referencing vals that are defined in an object or class will get the whole object serialized. So try to only refer to vals that are local (to a method). And most importantly don't execute your business logic from within a constructor/the body of an object or class.
回答2:
Your problem is somewhere else.
The only difference between the 2 snippets is the definition of val Eps = 5
outside of the map
which does not change at all the control flow of your code.
Please post more context so we can help.
来源:https://stackoverflow.com/questions/41983586/infinite-loop-when-replacing-concrete-value-by-parameter-name