How to restore RDD of (key,value) pairs after it has been stored/read from a text file

后端 未结 2 914
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-19 17:37

I saved my RDD of (key, value) pairs to a text file using saveAsTextFile. After I read the text file back using sc.textFile(\"filename.txt\") command, I ended u

相关标签:
2条回答
  • 2020-12-19 17:59

    ast.literal_eval should do the trick:

    import ast
    
    data1 = [(u'BAR_0', [1.0, 2.0, 3.0]), (u'FOO_1', [4.0, 5.0, 6.0])]
    rdd = sc.parallelize(data1)
    rdd.saveAsTextFile("foobar_text")
    
    data2 = sc.textFile("foobar_text").map(ast.literal_eval).collect()
    assert sorted(data1) == sorted(data2)
    

    but generally speaking it is better to avoid situation like this in the first place and use for example a SequenceFile:

    rdd.saveAsPickleFile("foobar_seq")
    sc.pickleFile("foobar_seq")
    
    0 讨论(0)
  • 2020-12-19 18:10

    You're going to have to implement a parser for your input. The easiest thing to do is to map your output to a character separated output with a tab or colon delimeter and use spilt(delimiter) in your map upon reading, basically like in the wordCount example.

    0 讨论(0)
提交回复
热议问题