spark - java heap space issue - ExecutorLostFailure - container exited with status 143

好久不见. 提交于 2020-01-11 13:18:06

问题


I am reading the string which is of length more than 100k bytes and splitting the columns based on width. I have close to 16K columns which I split from above string based on width.

but while writing into parquet i am using below code

rdd1=spark.sparkContext.textfile("file1")

{ var now=0
 { val collector= new array[String] (ColLenghth.length) 
 val recordlength=line.length
for (k<- 0 to colLength.length -1)
 { collector(k) = line.substring(now,now+colLength(k))
 now =now+colLength(k)
 }
 collector.toSeq}


StringArray=rdd1.map(SubstrSting(_,ColLengthSeq))
#here ColLengthSeq is read from another schema file which is column lengths



StringArray.toDF("StringCol").select(0 until ColCount).map(j=>$"StringCol"(j) as column_seq(j):_*).write.mode("overwrite").parquet("c"\home\")

here ColCount = 16000 and column_seq is seq(string) with 16K column names.

I am running this on Yarn with 16GB executor memory and 20 executors.
File size is 4GB.

I am getting the error as

Lost task 113.0 in stage 0.0 (TID 461, gsta32512.foo.com): ExecutorLostFailure (executor 28 exited caused by one of the running tasks) Reason: 
Container marked as failed: 
container_e05_1472185459203_255575_01_000183 on host: gsta32512.foo.com. Exit status: 143. Diagnostics: 
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

when i checked the status on UI its showing

#java.lang.outofmemoryerror java heap space
#java.lang.outofmemoryerror gc overhead limit exceeded

Please guide on performance tuning of above mentioned code and spark submit parameter optimization

来源:https://stackoverflow.com/questions/51118204/spark-java-heap-space-issue-executorlostfailure-container-exited-with-stat

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!