发表新帖

发表新帖

call of distinct and map together throws NPE in spark library

前端未结

关注

 2  1543

I am unsure if this is a bug, so if you do something like this

// d:spark.RDD[String]
d.distinct().map(x => d.filter(_.equals(x)))

you w

相关标签:

2条回答

死守一世寂寞

2020-11-30 13:22
what about the windowing example provided in the Spark 1.3.0 stream programming guide
```
val dataset: RDD[String, String] = ...
val windowedStream = stream.window(Seconds(20))...
val joinedStream = windowedStream.transform { rdd => rdd.join(dataset) }
```
SPARK-5063 causes the example to fail since the join is being called from within the transform method on an RDD
0 讨论(0)
发布评论:

提交评论
- 加载中...
执念已碎

2020-11-30 13:45
Spark does not support nested RDDs or user-defined functions that refer to other RDDs, hence the NullPointerException; see this thread on the spark-users mailing list.

It looks like your current code is trying to group the elements of d by value; you can do this efficiently with the groupBy() RDD method:
```
scala> val d = sc.parallelize(Seq("Hello", "World", "Hello"))
d: spark.RDD[java.lang.String] = spark.ParallelCollection@55c0c66a

scala> d.groupBy(x => x).collect()
res6: Array[(java.lang.String, Seq[java.lang.String])] = Array((World,ArrayBuffer(World)), (Hello,ArrayBuffer(Hello, Hello)))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题