Flink之Unio、coflatmap、CoGroup、Join以及Connect

问题导读

1.Flink 双数据流转换为单数据流操作有哪些？ 2.cogroup, join和coflatmap各自完成什么事情？ 3.cogroup, join和coflatmap区别是什么？

Flink 双数据流转换为单数据流操作的运算有cogroup, join,coflatmap与union。下面为大家对比介绍下这4个运算的功能和用法。

Join：只输出条件匹配的元素对。
CoGroup: 除了输出匹配的元素对以外，未能匹配的元素也会输出。
CoFlatMap：没有匹配条件，不进行匹配，分别处理两个流的元素。在此基础上完全可以实现join和cogroup的功能，比他们使用上更加自由。

join实例代码如下：

    
    private static DataStream<PositionJoinModel> PositionTestJoin(
            DataStream<ZongShu> grades,
            DataStream<ZongShu> salaries,
            long windowSize) {
        DataStream<PositionJoinModel> apply =grades.join(salaries)
          //join的条件stream1中的某个字段和stream2中的字段值相等
                .where(new partitionsKeySelector1())
                .equalTo(new partitionsKeySelector1())
    // 指定window，stream1和stream2中的数据会进入到该window中。只有该window中的数据才会被后续操作join
                .window(TumblingProcessingTimeWindows.of(Time.milliseconds(windowSize)))
                .apply(new JoinFunction<ZongShu, ZongShu, PositionJoinModel>() {
                  // 捕获到匹配的数据t1和t2，在这里可以进行组装等操作
                    @Override
                    public PositionJoinModel join(
                            ZongShu first,
                            ZongShu second) {
                        return new PositionJoinModel(first.getRoom(), first.getPartitions(),first.getNum(), second.getNum());
                    }
                });
        return apply;
    }

CoGroup实例代码：

private static DataStream<YCSB_LB_RESULT_Model> YCLB_Result_CGroup(
            DataStream<YCSB_LB_Model> grades,
            DataStream<YCSB_LB_Model> salaries,
            long windowSize) {
        DataStream<YCSB_LB_RESULT_Model> apply = grades.coGroup(salaries)
                .where(new YCFB_Result_KeySelector())
                .equalTo(new YCFB_Result_KeySelector())
                .window(TumblingProcessingTimeWindows.of(Time.milliseconds(windowSize)))
                .apply(new CoGroupFunction<YCSB_LB_Model, YCSB_LB_Model, YCSB_LB_RESULT_Model>() {
                    YCSB_LB_RESULT_Model ylrm = null;
                    @Override
                    public void coGroup(Iterable<YCSB_LB_Model> first, Iterable<YCSB_LB_Model> second, Collector<YCSB_LB_RESULT_Model> collector) throws Exception {
                        ylrm = new YCSB_LB_RESULT_Model();
                        for (YCSB_LB_Model s : first) {
                            String asset_id = s.getAsset_id();
                            ylrm.setAsset_id(asset_id);
                            ylrm.setName(s.getName());
                            ylrm.setIp(s.getIp());
                            ylrm.setRoom(s.getRoom());
                            ylrm.setPartitions(s.getPartitions());
                            ylrm.setBox(s.getBox());
                            ylrm.setLevel_1(s.getNum());
                        }
                        for (YCSB_LB_Model s1 : second) {
                            ylrm.setLevel_2(s1.getNum());
                        }
                        collector.collect(ylrm);
                    }
                });
        return apply;
    }

coflatmap实例代码：

  DataStream<Tuple2<String, Integer>> grades = WindowJoinSampleData.GradeSource.getSource(env, rate);
        DataStream<Tuple2<String, Integer>> salaries = WindowJoinSampleData.SalarySource.getSource(env, rate);
        KeyedStream<Tuple2<String, Integer>, Tuple> tuple2TupleKeyedStream = grades.keyBy(0);
        KeyedStream<Tuple2<String, Integer>, Tuple> tuple2TupleKeyedStream1 = salaries.keyBy(0);
        SingleOutputStreamOperator<Tuple3<String, Integer, Integer>> tuple3SingleOutputStreamOperator = tuple2TupleKeyedStream
                .connect(tuple2TupleKeyedStream1)
                .flatMap(new EnrichmentFunction());
public static class EnrichmentFunction extends RichCoFlatMapFunction<Tuple2<String,Integer>, Tuple2<String,Integer>, Tuple3<String, Integer,Integer>> {
        // keyed, managed state
        private ValueState<Tuple2<String,Integer>> rideState;
        private ValueState<Tuple2<String,Integer>> fareState;

        @Override
        public void open(Configuration config) {
            rideState = getRuntimeContext().getState(new ValueStateDescriptor<>("saved ride", TypeInformation.of(new TypeHint<Tuple2<String,Integer>>() {
            })));
            fareState = getRuntimeContext().getState(new ValueStateDescriptor<>("saved fare", TypeInformation.of(new TypeHint<Tuple2<String,Integer>>() {
            })));
        }

        @Override
        public void flatMap1(Tuple2<String,Integer> ride, Collector<Tuple3<String,Integer,Integer>> out) throws Exception {
            Tuple2<String,Integer> fare = fareState.value();
            if (fare != null) {
                fareState.clear();
                out.collect(new Tuple3(ride.f0,ride.f1, fare.f1));
            } else {
                rideState.update(ride);
            }
        }

        @Override
        public void flatMap2(Tuple2<String,Integer> fare, Collector<Tuple3<String,Integer,Integer>> out) throws Exception {
            Tuple2<String,Integer> ride = rideState.value();
            if (ride != null) {
                rideState.clear();
                out.collect(new Tuple3(ride.f0,ride.f1, fare.f1));
            } else {
                fareState.update(fare);
            }
        }
    }

总结

union虽然可以合并多个数据流，但有一个限制，即多个数据流的数据类型必须相同。connect提供了和union类似的功能，用来连接两个数据流，它与union的区别在于：

connect只能连接两个数据流，union可以连接多个数据流。
connect所连接的两个数据流的数据类型可以不一致，union所连接的两个数据流的数据类型必须一致。
两个DataStream经过connect之后被转化为ConnectedStreams，ConnectedStreams会对两个流的数据应用不同的处理方法，且双流之间可以共享状态。

来源：oschina

链接：https://my.oschina.net/112612/blog/3215689

标签

flink

string

ycsb