问题
I am using Spark's Word2Vec to train some word vectors. The training is essentially working but when it comes to saving the model I am getting a org.apache.spark.SparkException saying:
Job aborted due to stage failure: Serialized task 1278:0 was 1073394582 bytes, which exceeds max allowed: spark.akka.frameSize (134217728 bytes) - reserved (204800 bytes). Consider increasing spark.akka.frameSize or using broadcast variables for large values.
The stack trace points at line 190, but there is a chance that I changed some of the code and I think it's actually line 196 that causes the problem:
190: val sizeGb = (model.getVectors.size * arguments.getVectorSize * 4.0)/(1024*1024*1024.0);
191:
192: println("Final vocabulary word count: " + model.getVectors.size)
193: println("Output file size: ~ " + f"$sizeGb%1.4f" + " GB")
154: println("Saving model to " + outputFilePath)
195:
196: model.save(sc, outputFilePath)
From my own output I got an estimated model size of
// (vocab-size * vector-size * 4)/(1024^3) = ~ 0.9767 GB
val sizeGb = (model.getVectors.size * arguments.getVectorSize * 4.0)/(1024*1024*1024.0);
which comes close to 1073394582 bytes. The stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Serialized task 1278:0 was 1073394582 bytes, which exceeds max allowed: spark.akka.frameSize (134217728 bytes) - reserved (204800 bytes). Consider increasing spark.akka.frameSize or using broadcast variables for large values.
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
...
at org.apache.spark.mllib.feature.Word2VecModel$SaveLoadV1_0$.save(Word2Vec.scala:617)
at org.apache.spark.mllib.feature.Word2VecModel.save(Word2Vec.scala:489)
at masterthesis.code.wordvectors.Word2VecOnCluster$.main(Word2VecOnCluster.scala:190)
at masterthesis.code.wordvectors.Word2VecOnCluster.main(Word2VecOnCluster.scala)
The error message is clear but I am not sure what I can do about this. On the other hand I have already saved models larger than 125MB (our default frame size) and Spark didn't complain..
I am not sure what I can do about this..
回答1:
Just like your error Log suggests there are two ways of Doing this
Either by increasing the
spark.akka.frameSize, the default size is 128MB.You can refer to the Network Configuration Documentation Or if your using a standalone shell you can set it passing the argument
--driver-java-options "-Dspark.akka.frameSize=128"Or by using broadcast variables for large values.
来源:https://stackoverflow.com/questions/36692386/exceeding-spark-akka-framesize-when-saving-word2vecmodel