value reduceByKey is not a member of org.apache.spark.rdd.RDD

问题

It's very sad.My spark version is 2.1.1,Scala version is 2.11

import org.apache.spark.SparkContext._
import com.mufu.wcsa.component.dimension.{DimensionKey, KeyTrait}
import com.mufu.wcsa.log.LogRecord
import org.apache.spark.rdd.RDD

object PV {

//
  def stat[C <: LogRecord,K <:DimensionKey](statTrait: KeyTrait[C ,K],logRecords: RDD[C]): RDD[(K,Int)] = {
    val t = logRecords.map(record =>(statTrait.getKey(record),1)).reduceByKey((x,y) => x + y)

I got this error

at 1502387780429
[ERROR] /Users/lemanli/work/project/newcma/wcsa/wcsa_my/wcsavistor/src/main/scala/com/mufu/wcsa/component/stat/PV.scala:25: error: value reduceByKey is not a member of org.apache.spark.rdd.RDD[(K, Int)]
[ERROR]     val t = logRecords.map(record =>(statTrait.getKey(record),1)).reduceByKey((x,y) => x + y)

there is defined a trait

trait KeyTrait[C <: LogRecord,K <: DimensionKey]{
  def getKey(c:C):K
}

It is compiled,Thanks.

 def stat[C <: LogRecord,K <:DimensionKey : ClassTag : Ordering](statTrait: KeyTrait[C ,K],logRecords: RDD[C]): RDD[(K,Int)] = {
    val t = logRecords.map(record =>(statTrait.getKey(record),1)).reduceByKey((x,y) => x + y)

Key need to override Ordering[T].

  object ClientStat extends KeyTrait[DetailLogRecord, ClientStat] {
      implicit val c

lientStatSorting = new Ordering[ClientStat] {
    override def compare(x: ClientStat, y: ClientStat): Int = x.key.compare(y.key)
  }

      def getKey(detailLogRecord: DetailLogRecord): ClientStat = new ClientStat(detailLogRecord)
    }

回答1:

This comes from using a pair rdd function generically. The reduceByKey method is actually a method of the PairRDDFunctions class, which has an implicit conversion from RDD:

implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
    (implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null): PairRDDFunctions[K, V]

So it requires several implicit typeclasses. Normally when working with simple concrete types, those are already in scope. But you should be able to amend your method to also require those same implicits:

def stat[C <: LogRecord,K <:DimensionKey](statTrait: KeyTrait[C ,K],logRecords: RDD[C])(implicit kt: ClassTag[K], ord: Ordering[K])

Or using the newer syntax:

def stat[C <: LogRecord,K <:DimensionKey : ClassTag : Ordering](statTrait: KeyTrait[C ,K],logRecords: RDD[C])

回答2:

reduceByKey is a method that is only defined on RDDs of tuples, i.e. RDD[(K, V)] (K, V is just a convention to say that first is key second is value).

Not sure from the example about what you are trying to achieve, but for sure you need to convert the values inside the RDD to tuples of two values.

来源：https://stackoverflow.com/questions/45620797/value-reducebykey-is-not-a-member-of-org-apache-spark-rdd-rdd

标签

scala

generics

apache-spark

rdd

traits