Spark 1.0.2 (also 1.1.0) hangs on a partition
问题 I've got a weird problem in apache spark and I would appreciate some help. After reading data from hdfs (and doing some conversion from json to object) the next stage (processing said objects) fails after 2 partitions have been processed (out of 512 in total). This happens on large-ish datasets (the smallest I have noticed is about 700 megs, but could be lower, I haven't narrowed it down yet). EDIT: 700 megs is the tgz file size, uncompressed it's 6 gigs. EDIT 2: The same thing happens on