Apache Flink

聊聊flink的ProcessFunction

你说的曾经没有我的故事 提交于 2019-12-09 16:21:57
序 本文主要研究一下flink的ProcessFunction 实例 import org.apache.flink.api.common.state.ValueState; import org.apache.flink.api.common.state.ValueStateDescriptor; import org.apache.flink.api.java.tuple.Tuple2; import org.apache.flink.configuration.Configuration; import org.apache.flink.streaming.api.functions.ProcessFunction; import org.apache.flink.streaming.api.functions.ProcessFunction.Context; import org.apache.flink.streaming.api.functions.ProcessFunction.OnTimerContext; import org.apache.flink.util.Collector; // the source data stream DataStream<Tuple2<String, String>> stream = ...; // apply the

聊聊flink Table的ScalarFunction

Deadly 提交于 2019-12-09 16:20:25
序 本文主要研究一下flink Table的ScalarFunction 实例 public class HashCode extends ScalarFunction { private int factor = 0; @Override public void open(FunctionContext context) throws Exception { // access "hashcode_factor" parameter // "12" would be the default value if parameter does not exist factor = Integer.valueOf(context.getJobParameter("hashcode_factor", "12")); } public int eval(String s) { return s.hashCode() * factor; } } ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env); // set job parameter

聊聊flink Table的groupBy操作

拜拜、爱过 提交于 2019-12-09 12:53:53
序 本文主要研究一下flink Table的groupBy操作 Table.groupBy flink-table_2.11-1.7.0-sources.jar!/org/apache/flink/table/api/table.scala class Table( private[flink] val tableEnv: TableEnvironment, private[flink] val logicalPlan: LogicalNode) { //...... def groupBy(fields: String): GroupedTable = { val fieldsExpr = ExpressionParser.parseExpressionList(fields) groupBy(fieldsExpr: _*) } def groupBy(fields: Expression*): GroupedTable = { new GroupedTable(this, fields) } //...... } Table的groupBy操作支持两种参数,一种是String类型,一种是Expression类型;String参数的方法是将String转换为Expression,最后调用的Expression参数的groupBy方法,该方法创建了GroupedTable

聊聊flink Table的AggregateFunction

此生再无相见时 提交于 2019-12-09 12:52:06
序 本文主要研究一下flink Table的AggregateFunction 实例 /** * Accumulator for WeightedAvg. */ public static class WeightedAvgAccum { public long sum = 0; public int count = 0; } /** * Weighted Average user-defined aggregate function. */ public static class WeightedAvg extends AggregateFunction<Long, WeightedAvgAccum> { @Override public WeightedAvgAccum createAccumulator() { return new WeightedAvgAccum(); } @Override public Long getValue(WeightedAvgAccum acc) { if (acc.count == 0) { return 0L; } else { return acc.sum / acc.count; } } public void accumulate(WeightedAvgAccum acc, long iValue, int iWeight) {

聊聊flink的TimerService

左心房为你撑大大i 提交于 2019-12-09 11:01:59
序 本文主要研究一下flink的TimerService TimerService flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/TimerService.java @PublicEvolving public interface TimerService { String UNSUPPORTED_REGISTER_TIMER_MSG = "Setting timers is only supported on a keyed streams."; String UNSUPPORTED_DELETE_TIMER_MSG = "Deleting timers is only supported on a keyed streams."; long currentProcessingTime(); long currentWatermark(); void registerProcessingTimeTimer(long time); void registerEventTimeTimer(long time); void deleteProcessingTimeTimer(long time); void deleteEventTimeTimer(long time); }

聊聊flink的KvStateRegistryGateway

时光毁灭记忆、已成空白 提交于 2019-12-07 21:46:03
序 本文主要研究一下flink的KvStateRegistryGateway KvStateRegistryGateway flink-1.7.2/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/KvStateRegistryGateway.java public interface KvStateRegistryGateway { /** * Notifies that queryable state has been registered. * * @param jobId identifying the job for which to register a key value state * @param jobVertexId JobVertexID the KvState instance belongs to. * @param keyGroupRange Key group range the KvState instance belongs to. * @param registrationName Name under which the KvState has been registered. * @param kvStateId ID of the registered

聊聊flink Table的Distinct Aggregation

允我心安 提交于 2019-12-07 20:55:53
序 本文主要研究一下flink Table的Distinct Aggregation 实例 //Distinct can be applied to GroupBy Aggregation, GroupBy Window Aggregation and Over Window Aggregation. Table orders = tableEnv.scan("Orders"); // Distinct aggregation on group by Table groupByDistinctResult = orders .groupBy("a") .select("a, b.sum.distinct as d"); // Distinct aggregation on time window group by Table groupByWindowDistinctResult = orders .window(Tumble.over("5.minutes").on("rowtime").as("w")).groupBy("a, w") .select("a, b.sum.distinct as d"); // Distinct aggregation on over window Table result = orders .window(Over .partitionBy(

聊聊flink的JobManagerGateway

二次信任 提交于 2019-12-07 19:52:57
序 本文主要研究一下flink的JobManagerGateway RestfulGateway flink-1.7.2/flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/RestfulGateway.java public interface RestfulGateway extends RpcGateway { CompletableFuture<Acknowledge> cancelJob(JobID jobId, @RpcTimeout Time timeout); CompletableFuture<Acknowledge> stopJob(JobID jobId, @RpcTimeout Time timeout); CompletableFuture<String> requestRestAddress(@RpcTimeout Time timeout); CompletableFuture<? extends AccessExecutionGraph> requestJob(JobID jobId, @RpcTimeout Time timeout); CompletableFuture<JobResult> requestJobResult(JobID jobId,

Apache Flink : Checkpoint 原理剖析与应用实践

家住魔仙堡 提交于 2019-12-07 15:08:35
Checkpoint 与 state 的关系 Checkpoint 是从 source 触发到下游所有节点完成的一次全局操作。下图可以有一个对 Checkpoint 的直观感受,红框里面可以看到一共触发了 569K 次 Checkpoint,然后全部都成功完成,没有 fail 的。 state 其实就是 Checkpoint 所做的主要持久化备份的主要数据 ,看下图的具体数据统计,其 state 也就 9kb 大小 。 什么是 state 我们接下来看什么是 state。先看一个非常经典的 word count 代码,这段代码会去监控本地的 9000 端口的数据并对网络端口输入进行词频统计,我们本地行动 netcat,然后在终端输入 hello world,执行程序会输出什么? 答案很明显,(hello, 1) 和 (word,1) 那么问题来了,如果再次在终端输入 hello world,程序会输入什么? 答案其实也很明显,(hello, 2) 和 (world, 2)。为什么 Flink 知道之前已经处理过一次 hello world,这就是 state 发挥作用了,这里是被称为 keyed state 存储了之前需要统计的数据,所以帮助 Flink 知道 hello 和 world 分别出现过一次。 回顾一下刚才这段 word count 代码。keyby 接口的调用会创建

聊聊flink的InternalTimeServiceManager

耗尽温柔 提交于 2019-12-07 15:08:23
序 本文主要研究一下flink的InternalTimeServiceManager InternalTimeServiceManager flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/operators/InternalTimeServiceManager.java @Internal public class InternalTimeServiceManager<K> { @VisibleForTesting static final String TIMER_STATE_PREFIX = "_timer_state"; @VisibleForTesting static final String PROCESSING_TIMER_PREFIX = TIMER_STATE_PREFIX + "/processing_"; @VisibleForTesting static final String EVENT_TIMER_PREFIX = TIMER_STATE_PREFIX + "/event_"; private final KeyGroupRange localKeyGroupRange; private final KeyContext keyContext;