问题
I have an RDD:
JavaRDD<Tuple2<Tuple2<String, Long>, Long>> mappedRdd = dataRDD
.values().map(mapFunc);
I want to run a reduce function on it:
private static Function2<Tuple2<Tuple2<String, Long>, Long>, Tuple2<Tuple2<String, Long>, Long>, Tuple2<Tuple2<String, Long>, Long>> redFunc2 = new Function2<Tuple2<Tuple2<String, Long>, Long>, Tuple2<Tuple2<String, Long>, Long>, Tuple2<Tuple2<String, Long>, Long>>() {
@Override
public Tuple2<String, MetricDatum> call(
Tuple2<Tuple2<String, Long>, Long> v1,
Tuple2<Tuple2<String, Long>, Long> v2) throws Exception {
long sum = 0L; // sum up the values
sum += v1._2();
sum += v2._2();
String dimension = v1._1()._1();
long timestamp = v1._1()._2();
MetricDatum metricDatum = new MetricDatum();
metricDatum.setMetricDimension(dimension);
metricDatum.setTimestamp(timestamp);
String key = metricDatum.getMetricDimension().toString();
key += "_" + Long.toString(timestamp);
metricDatum.setMetric(sum);
return new Tuple2<>(key, metricDatum);
}
};
However it gives error:
JavaRDD<Tuple2<Tuple2<String, Long>, Long>> reducedGoraRdd = mappedRdd.reduce(redFunc);
I want to do this example by Spark LogAnalytics.java
Do I miss anything, should I use flatMap etc. or that reduce function is totally wrong?
回答1:
Base on reduce function from LogAnalytics.java
I wrote sth like this:
//dummy
class MetricDatum {
public void setMetricDimension(String l) {}
public void setTimestamp(Long l) {}
public void setMetric(Long l) {}
public Object getMetricDimension() {return new Object();}
}
//fake input
JavaRDD<Tuple2<Tuple2<String, Long>, Long>> mappedRdd = sc.emptyRDD();
//creating JavaPairRDD from JavaRDD of pairs
JavaPairRDD.fromJavaRDD(mappedRdd)
//reduce with commutative, associative function (Long, Long) -> Long
.reduceByKey(new Function2<Long, Long, Long>() {
@Override
public Long call(Long aLong, Long aLong2) throws Exception {
return aLong + aLong2;
}
})
//map (key, sum) pairs to (newKey, metricDatum(sum)) and creatring JavaPairRDD
.mapToPair(new PairFunction<Tuple2<Tuple2<String,Long>,Long>, String, MetricDatum>() {
@Override
public Tuple2<String, MetricDatum>
call(Tuple2<Tuple2<String, Long>, Long> tuple2LongTuple2) throws Exception {
String dimension = tuple2LongTuple2._1()._1();
long timestamp = tuple2LongTuple2._1()._2();
MetricDatum metricDatum = new MetricDatum();
metricDatum.setMetricDimension(dimension);
metricDatum.setTimestamp(timestamp);
String key = metricDatum.getMetricDimension().toString();
key += "_" + Long.toString(timestamp);
metricDatum.setMetric(tuple2LongTuple2._2());
return new Tuple2<String, MetricDatum>(key, metricDatum);
}
});
来源:https://stackoverflow.com/questions/31112626/type-mismatch-for-reduce