Flink之Unio、coflatmap、CoGroup、Join以及Connect

你离开我真会死。 提交于 2020-04-05 23:07:41

Flink之Unio、coflatmap、CoGroup、Join以及Connect

问题导读

1.Flink 双数据流转换为单数据流操作有哪些? 2.cogroup, join和coflatmap各自完成什么事情? 3.cogroup, join和coflatmap区别是什么?

Flink 双数据流转换为单数据流操作的运算有cogroup, join,coflatmap与union。下面为大家对比介绍下这4个运算的功能和用法。

  • Join:只输出条件匹配的元素对。
  • CoGroup: 除了输出匹配的元素对以外,未能匹配的元素也会输出。
  • CoFlatMap:没有匹配条件,不进行匹配,分别处理两个流的元素。在此基础上完全可以实现join和cogroup的功能,比他们使用上更加自由。

join实例代码如下:

    
    private static DataStream<PositionJoinModel> PositionTestJoin(
            DataStream<ZongShu> grades,
            DataStream<ZongShu> salaries,
            long windowSize) {
        DataStream<PositionJoinModel> apply =grades.join(salaries)
          //join的条件stream1中的某个字段和stream2中的字段值相等
                .where(new partitionsKeySelector1())
                .equalTo(new partitionsKeySelector1())
    // 指定window,stream1和stream2中的数据会进入到该window中。只有该window中的数据才会被后续操作join
                .window(TumblingProcessingTimeWindows.of(Time.milliseconds(windowSize)))
                .apply(new JoinFunction<ZongShu, ZongShu, PositionJoinModel>() {
                  // 捕获到匹配的数据t1和t2,在这里可以进行组装等操作
                    @Override
                    public PositionJoinModel join(
                            ZongShu first,
                            ZongShu second) {
                        return new PositionJoinModel(first.getRoom(), first.getPartitions(),first.getNum(), second.getNum());
                    }
                });
        return apply;
    }

CoGroup实例代码:

private static DataStream<YCSB_LB_RESULT_Model> YCLB_Result_CGroup(
            DataStream<YCSB_LB_Model> grades,
            DataStream<YCSB_LB_Model> salaries,
            long windowSize) {
        DataStream<YCSB_LB_RESULT_Model> apply = grades.coGroup(salaries)
                .where(new YCFB_Result_KeySelector())
                .equalTo(new YCFB_Result_KeySelector())
                .window(TumblingProcessingTimeWindows.of(Time.milliseconds(windowSize)))
                .apply(new CoGroupFunction<YCSB_LB_Model, YCSB_LB_Model, YCSB_LB_RESULT_Model>() {
                    YCSB_LB_RESULT_Model ylrm = null;
                    @Override
                    public void coGroup(Iterable<YCSB_LB_Model> first, Iterable<YCSB_LB_Model> second, Collector<YCSB_LB_RESULT_Model> collector) throws Exception {
                        ylrm = new YCSB_LB_RESULT_Model();
                        for (YCSB_LB_Model s : first) {
                            String asset_id = s.getAsset_id();
                            ylrm.setAsset_id(asset_id);
                            ylrm.setName(s.getName());
                            ylrm.setIp(s.getIp());
                            ylrm.setRoom(s.getRoom());
                            ylrm.setPartitions(s.getPartitions());
                            ylrm.setBox(s.getBox());
                            ylrm.setLevel_1(s.getNum());
                        }
                        for (YCSB_LB_Model s1 : second) {
                            ylrm.setLevel_2(s1.getNum());
                        }
                        collector.collect(ylrm);
                    }
                });
        return apply;
    }

coflatmap实例代码:

  DataStream<Tuple2<String, Integer>> grades = WindowJoinSampleData.GradeSource.getSource(env, rate);
        DataStream<Tuple2<String, Integer>> salaries = WindowJoinSampleData.SalarySource.getSource(env, rate);
        KeyedStream<Tuple2<String, Integer>, Tuple> tuple2TupleKeyedStream = grades.keyBy(0);
        KeyedStream<Tuple2<String, Integer>, Tuple> tuple2TupleKeyedStream1 = salaries.keyBy(0);
        SingleOutputStreamOperator<Tuple3<String, Integer, Integer>> tuple3SingleOutputStreamOperator = tuple2TupleKeyedStream
                .connect(tuple2TupleKeyedStream1)
                .flatMap(new EnrichmentFunction());
public static class EnrichmentFunction extends RichCoFlatMapFunction<Tuple2<String,Integer>, Tuple2<String,Integer>, Tuple3<String, Integer,Integer>> {
        // keyed, managed state
        private ValueState<Tuple2<String,Integer>> rideState;
        private ValueState<Tuple2<String,Integer>> fareState;

        @Override
        public void open(Configuration config) {
            rideState = getRuntimeContext().getState(new ValueStateDescriptor<>("saved ride", TypeInformation.of(new TypeHint<Tuple2<String,Integer>>() {
            })));
            fareState = getRuntimeContext().getState(new ValueStateDescriptor<>("saved fare", TypeInformation.of(new TypeHint<Tuple2<String,Integer>>() {
            })));
        }

        @Override
        public void flatMap1(Tuple2<String,Integer> ride, Collector<Tuple3<String,Integer,Integer>> out) throws Exception {
            Tuple2<String,Integer> fare = fareState.value();
            if (fare != null) {
                fareState.clear();
                out.collect(new Tuple3(ride.f0,ride.f1, fare.f1));
            } else {
                rideState.update(ride);
            }
        }

        @Override
        public void flatMap2(Tuple2<String,Integer> fare, Collector<Tuple3<String,Integer,Integer>> out) throws Exception {
            Tuple2<String,Integer> ride = rideState.value();
            if (ride != null) {
                rideState.clear();
                out.collect(new Tuple3(ride.f0,ride.f1, fare.f1));
            } else {
                fareState.update(fare);
            }
        }
    }

总结

union虽然可以合并多个数据流,但有一个限制,即多个数据流的数据类型必须相同。connect提供了和union类似的功能,用来连接两个数据流,它与union的区别在于:

  • connect只能连接两个数据流,union可以连接多个数据流。
  • connect所连接的两个数据流的数据类型可以不一致,union所连接的两个数据流的数据类型必须一致。
  • 两个DataStream经过connect之后被转化为ConnectedStreams,ConnectedStreams会对两个流的数据应用不同的处理方法,且双流之间可以共享状态。
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!