I am new to Spark. and I have input file with training data 4000x1800. When I try to train this data (written python) get following error:
14/11/15 22:39:13
I got the same error,then i got an releated answer from pyspark process big datasets problems
the solution is add some code python/pyspark/worker.py
Add the following 2 lines to the end of the process function defined inside the main function
for obj in iterator:
pass
so the process function now looks like this (in spark 1.5.2 at least):
def process():
iterator = deserializer.load_stream(infile)
serializer.dump_stream(func(split_index, iterator), outfile)
for obj in iterator:
pass
and this works for me.