I am able to run this script to save the file in text format, but when I try to run saveAsSequenceFile it is erroring out. If any one have idea about how to save the RDD as sequence file, please let me know the process. I tried looking for solution in "Learning Spark" as well as official Spark documentation.
This runs successfully
dataRDD = sc.textFile("/user/cloudera/sqoop_import/departments") dataRDD.saveAsTextFile("/user/cloudera/pyspark/departments")
This fails
dataRDD = sc.textFile("/user/cloudera/sqoop_import/departments") dataRDD.saveAsSequenceFile("/user/cloudera/pyspark/departmentsSeq")
Error: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.saveAsSequenceFile. : org.apache.spark.SparkException: RDD element of type java.lang.String cannot be used
Here is the data:
2,Fitness 3,Footwear 4,Apparel 5,Golf 6,Outdoors 7,Fan Shop 8,TESTING 8000,TESTING