问题
This is the sample example code in my book:
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("spark://chetan-ThinkPad-
E470:7077").setAppName("FlatMap")
sc = SparkContext(conf=conf)
numbersRDD = sc.parallelize([1, 2, 3, 4])
actionRDD = numbersRDD.flatMap(lambda x: x + x).collect()
for values in actionRDD:
print(values)
I am getting this error: TypeError: 'int' object is not iterable
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
回答1:
You cannot use flatMap on an Int object
flatMap can be used in collection objects such as Arrays or list.
You can use map function on the rdd type that you have RDD[Integer]
numbersRDD = sc.parallelize([1, 2, 3, 4])
actionRDD = numbersRDD.map(lambda x: x + x)
def printing(x):
print x
actionRDD.foreach(printing)
which should print
2
4
6
8
来源:https://stackoverflow.com/questions/49189097/pyspark-flatmat-error-typeerror-int-object-is-not-iterable