mapPartitions returns empty array

六眼飞鱼酱① 提交于 2019-12-04 08:59:25
Justin Pihony

That's because x is a TraversableOnce, which means that you traversed it by calling size and then returned it back....empty.

You could work around it a number of ways, but here is one:

rdd.mapPartitions(x=> {
  val list = x.toList;
  println(list.size);
  list.toIterator
}).collect

To understand what is going on we have to take a look at the signature of the function you pass to mapPartitions:

(Iterator[T]) ⇒ Iterator[U]

So what is an Iterator? If you take a look at the Iterator documentation you'll see it is a trait which extends TraversableOnce:

trait Iterator[+A] extends TraversableOnce[A]

Above should give you a hint what happens in your case. Iterators provide two methods hasNext and next. To get the size of the Iterator you have to simply iterate over it. After that hasNext returns false and you get an empty Iterator as the result.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!