Apache Flink

聊聊flink的TimeCharacteristic

一个人想着一个人 提交于 2019-12-05 09:55:23
序 本文主要研究一下flink的TimeCharacteristic TimeCharacteristic flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/TimeCharacteristic.java /** * The time characteristic defines how the system determines time for time-dependent * order and operations that depend on time (such as time windows). */ @PublicEvolving public enum TimeCharacteristic { /** * Processing time for operators means that the operator uses the system clock of the machine * to determine the current time of the data stream. Processing-time windows trigger based * on wall-clock time and include whatever elements

聊聊flink的Triggers

老子叫甜甜 提交于 2019-12-05 09:54:36
序 本文主要研究一下flink的Triggers Trigger flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/windowing/triggers/Trigger.java @PublicEvolving public abstract class Trigger<T, W extends Window> implements Serializable { private static final long serialVersionUID = -4104633972991191369L; public abstract TriggerResult onElement(T element, long timestamp, W window, TriggerContext ctx) throws Exception; public abstract TriggerResult onProcessingTime(long time, W window, TriggerContext ctx) throws Exception; public abstract TriggerResult onEventTime(long time, W window, TriggerContext

聊聊flink Table的Group Windows

社会主义新天地 提交于 2019-12-05 09:54:23
序 本文主要研究一下flink Table的Group Windows 实例 Table table = input .window([Window w].as("w")) // define window with alias w .groupBy("w") // group the table by window w .select("b.sum"); // aggregate Table table = input .window([Window w].as("w")) // define window with alias w .groupBy("w, a") // group the table by attribute a and window w .select("a, b.sum"); // aggregate Table table = input .window([Window w].as("w")) // define window with alias w .groupBy("w, a") // group the table by attribute a and window w .select("a, w.start, w.end, w.rowtime, b.count"); // aggregate and add window start, end,

聊聊flink Table的Over Windows

大憨熊 提交于 2019-12-05 09:54:08
序 本文主要研究一下flink Table的Over Windows 实例 Table table = input .window([OverWindow w].as("w")) // define over window with alias w .select("a, b.sum over w, c.min over w"); // aggregate over the over window w Over Windows类似SQL的over子句,它可以基于event-time、processing-time或者row-count;具体可以通过Over类来构造,其中必须设置orderBy、preceding及as方法;它有Unbounded及Bounded两大类 Unbounded Over Windows实例 // Unbounded Event-time over window (assuming an event-time attribute "rowtime") .window(Over.partitionBy("a").orderBy("rowtime").preceding("unbounded_range").as("w")); // Unbounded Processing-time over window (assuming a processing-time

聊聊flink的RestartStrategies

不羁岁月 提交于 2019-12-05 09:53:40
序 本文主要研究一下flink的RestartStrategies RestartStrategies flink-core-1.7.1-sources.jar!/org/apache/flink/api/common/restartstrategy/RestartStrategies.java @PublicEvolving public class RestartStrategies { /** * Generates NoRestartStrategyConfiguration. * * @return NoRestartStrategyConfiguration */ public static RestartStrategyConfiguration noRestart() { return new NoRestartStrategyConfiguration(); } public static RestartStrategyConfiguration fallBackRestart() { return new FallbackRestartStrategyConfiguration(); } /** * Generates a FixedDelayRestartStrategyConfiguration. * * @param restartAttempts

聊聊flink的slot.request.timeout配置

限于喜欢 提交于 2019-12-05 09:53:24
序 本文主要研究一下flink的slot.request.timeout配置 JobManagerOptions flink-release-1.7.2/flink-core/src/main/java/org/apache/flink/configuration/JobManagerOptions.java @PublicEvolving public class JobManagerOptions { //...... /** * The timeout in milliseconds for requesting a slot from Slot Pool. */ public static final ConfigOption<Long> SLOT_REQUEST_TIMEOUT = key("slot.request.timeout") .defaultValue(5L * 60L * 1000L) .withDescription("The timeout in milliseconds for requesting a slot from Slot Pool."); //...... } slot.request.timeout默认为5分钟 SlotManagerConfiguration flink-release-1.7.2/flink-runtime/src

聊聊flink的jobstore配置

本秂侑毒 提交于 2019-12-05 09:53:07
序 本文主要研究一下flink的jobstore配置 JobManagerOptions flink-1.7.2/flink-core/src/main/java/org/apache/flink/configuration/JobManagerOptions.java @PublicEvolving public class JobManagerOptions { //...... /** * The job store cache size in bytes which is used to keep completed * jobs in memory. */ public static final ConfigOption<Long> JOB_STORE_CACHE_SIZE = key("jobstore.cache-size") .defaultValue(50L * 1024L * 1024L) .withDescription("The job store cache size in bytes which is used to keep completed jobs in memory."); /** * The time in seconds after which a completed job expires and is purged from the

聊聊flink的InputFormatSourceFunction

时光毁灭记忆、已成空白 提交于 2019-12-05 09:52:52
序 本文主要研究一下flink的InputFormatSourceFunction 实例 final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); IteratorInputFormat iteratorInputFormat = new IteratorInputFormat<String>(new WordIterator()); env //TypeInformation.of(new TypeHint<String>() {} .createInput(iteratorInputFormat,TypeExtractor.createTypeInfo(String.class)) .setParallelism(1) .print(); 这里使用IteratorInputFormat调用env的createInput方法来创建SourceFunction StreamExecutionEnvironment.createInput flink-streaming-java_2.11-1.6.2-sources.jar!/org/apache/flink/streaming/api/environment

聊聊flink的ParallelIteratorInputFormat

戏子无情 提交于 2019-12-05 09:52:33
序 本文主要研究一下flink的ParallelIteratorInputFormat 实例 final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSet<Long> dataSet = env.generateSequence(15,106) .setParallelism(3); dataSet.print(); 这里使用ExecutionEnvironment的generateSequence方法创建了带NumberSequenceIterator的ParallelIteratorInputFormat ParallelIteratorInputFormat flink-java-1.6.2-sources.jar!/org/apache/flink/api/java/io/ParallelIteratorInputFormat.java /** * An input format that generates data in parallel through a {@link SplittableIterator}. */ @PublicEvolving public class ParallelIteratorInputFormat<T> extends

一文看懂 K8s 日志系统设计和实践

扶醉桌前 提交于 2019-12-05 02:24:08
上一篇 中我们介绍了为什么需要一个日志系统、为什么云原生下的日志系统如此重要以及云原生下日志系统的建设难点,相信DevOps、SRE、运维等同学看了是深有体会的。本篇文章单刀直入,会直接跟大家分享一下如何在云原生的场景下搭建一个灵活、功能强大、可靠、可扩容的日志系统。 需求驱动架构设计 技术架构,是将产品需求转变为技术实现的过程。对于所有的架构师而言,能够将产品需求分析透彻是非常基本也是非常重要的一点。很多系统刚建成没多久就要被推翻,最根本的原因还是没有解决好产品真正的需求。 我所在的日志服务团队在日志这块有近10年的经验,几乎服务阿里内部所有的团队,涉及电商、支付、物流、云计算、游戏、即时通讯、IoT等领域,多年来的产品功能的优化和迭代都是基于各个团队的日志需求变化。 有幸我们最近几年在阿里云上实现了产品化,服务了数以万计的企业用户,包括国内各大直播类、短视频、新闻媒体、游戏等行业Top1互联网客户。产品功能从服务一个公司到服务上万家公司会有质的差别,上云促使我们更加深入的去思考:究竟哪些功能是日志这个平台需要去为用户去解决的,日志最核心的诉求是什么,如何去满足各行各业、各种不同业务角色的需求... 需求分解与功能设计 上一节中我们分析了公司内各个不同角色对于日志的相关需求,总结起来有以下几点: 支持各种日志格式、数据源的采集,包括非K8s 能够快速的查找/定位问题日志