Apache Spark: pyspark crash for large dataset

后端 未结 5 1222
清歌不尽
清歌不尽 2021-01-01 20:48

I am new to Spark. and I have input file with training data 4000x1800. When I try to train this data (written python) get following error:

  1. 14/11/15 22:39:13

5条回答
  •  南笙
    南笙 (楼主)
    2021-01-01 21:00

    Mrutynjay,

    Though I do not have definitive answer. The issue looks like something related to the memory. I also encountered the same issue when trying to read a file of 5 MB. I deleted a portion of the file and and reduced to less than 1 MB and the code worked.

    I also found something on the same issue here in the below site as well.

    http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-Failed-to-run-first-td7691.html

提交回复
热议问题