Spark Mlib FPGrowth job fails with Memory Error
问题 I have a fairly simple use case, but potentially very large result set. My code does the following (on pyspark shell): from pyspark.mllib.fpm import FPGrowth data = sc.textFile("/Users/me/associationtestproject/data/sourcedata.txt") transactions = data.map(lambda line: line.strip().split(' ')) model = FPGrowth.train(transactions, minSupport=0.000001, numPartitions=1000) # Perform any RDD operation for item in model.freqItemsets().toLocalIterator(): # do something with item I find that