问题
I am using Spark's Word2Vec
to train some word vectors. The training is essentially working but when it comes to saving the model I am getting a org.apache.spark.SparkException
saying:
Job aborted due to stage failure: Serialized task 1278:0 was 1073394582 bytes, which exceeds max allowed: spark.akka.frameSize (134217728 bytes) - reserved (204800 bytes). Consider increasing spark.akka.frameSize or using broadcast variables for large values.
The stack trace points at line 190, but there is a chance that I changed some of the code and I think it's actually line 196 that causes the problem:
190: val sizeGb = (model.getVectors.size * arguments.getVectorSize * 4.0)/(1024*1024*1024.0);
191:
192: println("Final vocabulary word count: " + model.getVectors.size)
193: println("Output file size: ~ " + f"$sizeGb%1.4f" + " GB")
154: println("Saving model to " + outputFilePath)
195:
196: model.save(sc, outputFilePath)
From my own output I got an estimated model size of
// (vocab-size * vector-size * 4)/(1024^3) = ~ 0.9767 GB
val sizeGb = (model.getVectors.size * arguments.getVectorSize * 4.0)/(1024*1024*1024.0);
which comes close to 1073394582 bytes. The stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Serialized task 1278:0 was 1073394582 bytes, which exceeds max allowed: spark.akka.frameSize (134217728 bytes) - reserved (204800 bytes). Consider increasing spark.akka.frameSize or using broadcast variables for large values.
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
...
at org.apache.spark.mllib.feature.Word2VecModel$SaveLoadV1_0$.save(Word2Vec.scala:617)
at org.apache.spark.mllib.feature.Word2VecModel.save(Word2Vec.scala:489)
at masterthesis.code.wordvectors.Word2VecOnCluster$.main(Word2VecOnCluster.scala:190)
at masterthesis.code.wordvectors.Word2VecOnCluster.main(Word2VecOnCluster.scala)
The error message is clear but I am not sure what I can do about this. On the other hand I have already saved models larger than 125MB (our default frame size) and Spark didn't complain..
I am not sure what I can do about this..
回答1:
Just like your error Log suggests there are two ways of Doing this
Either by increasing the
spark.akka.frameSize
, the default size is 128MB.You can refer to the Network Configuration Documentation Or if your using a standalone shell you can set it passing the argument
--driver-java-options "-Dspark.akka.frameSize=128"
Or by using broadcast variables for large values.
来源:https://stackoverflow.com/questions/36692386/exceeding-spark-akka-framesize-when-saving-word2vecmodel