Apache Spark: pyspark crash for large dataset

后端 未结 5 1209
清歌不尽
清歌不尽 2021-01-01 20:48

I am new to Spark. and I have input file with training data 4000x1800. When I try to train this data (written python) get following error:

  1. 14/11/15 22:39:13

5条回答
  •  鱼传尺愫
    2021-01-01 21:03

    It's so simple.

    conf = SparkConf().setMaster("local").setAppName("RatingsHistogram") 
    sc = SparkContext(conf = conf) 
    lines = sc.textFile("file:///SparkCourse/filter_1.csv",2000) 
    print lines.first()
    

    while using sc.textfile add one more parameters for the number of divisions to a large value. The bigger the data the larger the value.

提交回复
热议问题