Apache Flink | 易学教程

Apache Flink 零基础入门（十）Flink DataSet编程

阅读更多关于 Apache Flink 零基础入门（十）Flink DataSet编程

DataSet programs in Flink are regular programs that implement transformations on data sets (e.g., filtering, mapping, joining, grouping). The data sets are initially created from certain sources (e.g., by reading files, or from local collections). Results are returned via sinks, which may for example write the data to (distributed) files, or to standard output (for example the command line terminal). Flink programs run in a variety of contexts, standalone, or embedded in other programs. The execution can happen in a local JVM, or on clusters of many machines. Flink中DataSet编程是非常常规的编程

Flink入门（一）——Apache Flink介绍

阅读更多关于 Flink入门（一）——Apache Flink介绍

Apache Flink是什么？在当代数据量激增的时代，各种业务场景都有大量的业务数据产生，对于这些不断产生的数据应该如何进行有效的处理，成为当下大多数公司所面临的问题。随着雅虎对hadoop的开源，越来越多的大数据处理技术开始涌入人们的视线，例如目前比较流行的大数据处理引擎Apache Spark,基本上已经取代了MapReduce成为当前大数据处理的标准。但是随着数据的不断增长，新技术的不断发展，人们逐渐意识到对实时数据处理的重要性。相对于传统的数据处理模式，流式数据处理有着更高的处理效率和成本控制能力。Flink 就是近年来在开源社区不断发展的技术中的能够同时支持高吞吐、低延迟、高性能的分布式处理框架。数据架构的演变如图所示，传统的单体数据架构最大的特点便是集中式数据存储，大多数将架构分为计算层和存储层。单体架构的初期效率很高，但是随着时间的推移，业务越来越多，系统逐渐变得很大，越来越难以维护和升级，数据库是唯一的准确数据源，每个应用都需要访问数据库来获取对应的数据，如果数据库发生改变或者出现问题，则将对整个业务系统产生影响。后来随着微服务架构的出现，企业开始采用微服务作为企业业务系统的架构体系。微服务架构的核心思想是：一个应用是由多个小的、相互独立的微服务组成，这些服务运行在自己的进程中，开发和发布都没有依赖。不同的服务能依据不同的业务需求

聊聊flink的CheckpointScheduler

阅读更多关于聊聊flink的CheckpointScheduler

序本文主要研究一下flink的CheckpointScheduler CheckpointCoordinatorDeActivator flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorDeActivator.java /** * This actor listens to changes in the JobStatus and activates or deactivates the periodic * checkpoint scheduler. */ public class CheckpointCoordinatorDeActivator implements JobStatusListener { private final CheckpointCoordinator coordinator; public CheckpointCoordinatorDeActivator(CheckpointCoordinator coordinator) { this.coordinator = checkNotNull(coordinator); } @Override public void jobStatusChanges(JobID

聊聊flink的JDBCOutputFormat

阅读更多关于聊聊flink的JDBCOutputFormat

序本文主要研究一下flink的JDBCOutputFormat JDBCOutputFormat flink-jdbc_2.11-1.7.0-sources.jar!/org/apache/flink/api/java/io/jdbc/JDBCOutputFormat.java /** * OutputFormat to write Rows into a JDBC database. * The OutputFormat has to be configured using the supplied OutputFormatBuilder. * * @see Row * @see DriverManager */ public class JDBCOutputFormat extends RichOutputFormat<Row> { private static final long serialVersionUID = 1L; static final int DEFAULT_BATCH_INTERVAL = 5000; private static final Logger LOG = LoggerFactory.getLogger(JDBCOutputFormat.class); private String username; private String

聊聊flink的ListCheckpointed

阅读更多关于聊聊flink的ListCheckpointed

序本文主要研究一下flink的ListCheckpointed 实例 public static class CounterSource extends RichParallelSourceFunction<Long> implements ListCheckpointed<Long> { /** current offset for exactly once semantics */ private Long offset; /** flag for job cancellation */ private volatile boolean isRunning = true; @Override public void run(SourceContext<Long> ctx) { final Object lock = ctx.getCheckpointLock(); while (isRunning) { // output and state update are atomic synchronized (lock) { ctx.collect(offset); offset += 1; } } } @Override public void cancel() { isRunning = false; } @Override public List<Long>

聊聊flink KeyedStream的KeySelector

阅读更多关于聊聊flink KeyedStream的KeySelector

序本文主要研究一下flink KeyedStream的KeySelector KeyedStream flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/datastream/KeyedStream.java @Public public class KeyedStream<T, KEY> extends DataStream<T> { /** * The key selector that can get the key by which the stream if partitioned from the elements. */ private final KeySelector<T, KEY> keySelector; /** The type of the key by which the stream is partitioned. */ private final TypeInformation<KEY> keyType; /** * Creates a new {@link KeyedStream} using the given {@link KeySelector} * to partition operator state by key. * * @param

《从0到1学习Flink》—— Flink Data transformation(转换)

阅读更多关于《从0到1学习Flink》—— Flink Data transformation(转换)

前言在第一篇介绍 Flink 的文章《《从0到1学习Flink》—— Apache Flink 介绍》中就说过 Flink 程序的结构 Flink 应用程序结构就是如上图所示： 1、Source: 数据源，Flink 在流处理和批处理上的 source 大概有 4 类：基于本地集合的 source、基于文件的 source、基于网络套接字的 source、自定义的 source。自定义的 source 常见的有 Apache kafka、Amazon Kinesis Streams、RabbitMQ、Twitter Streaming API、Apache NiFi 等，当然你也可以定义自己的 source。 2、Transformation：数据转换的各种操作，有 Map / FlatMap / Filter / KeyBy / Reduce / Fold / Aggregations / Window / WindowAll / Union / Window join / Split / Select / Project 等，操作很多，可以将数据转换计算成你想要的数据。 3、Sink：接收器，Flink 将转换计算后的数据发送的地点，你可能需要存储下来，Flink 常见的 Sink 大概有如下几类：写入文件、打印出来、写入 socket 、自定义的 sink 。自定义的

聊聊flink的FsStateBackend

阅读更多关于聊聊flink的FsStateBackend

序本文主要研究一下flink的FsStateBackend StateBackend flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/state/StateBackend.java @PublicEvolving public interface StateBackend extends java.io.Serializable { // ------------------------------------------------------------------------ // Checkpoint storage - the durable persistence of checkpoint data // ------------------------------------------------------------------------ CompletedCheckpointStorageLocation resolveCheckpoint(String externalPointer) throws IOException; CheckpointStorage createCheckpointStorage(JobID jobId) throws IOException; /

聊聊flink的MemoryBackendCheckpointStorage

阅读更多关于聊聊flink的MemoryBackendCheckpointStorage

序本文主要研究一下flink的MemoryBackendCheckpointStorage CheckpointStorage flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/state/CheckpointStorage.java /** * CheckpointStorage implements the durable storage of checkpoint data and metadata streams. * An individual checkpoint or savepoint is stored to a {@link CheckpointStorageLocation}, * created by this class. */ public interface CheckpointStorage { boolean supportsHighlyAvailableStorage(); boolean hasDefaultSavepointLocation(); CompletedCheckpointStorageLocation resolveCheckpoint(String externalPointer) throws IOException;

聊聊flink的MemCheckpointStreamFactory

阅读更多关于聊聊flink的MemCheckpointStreamFactory

序本文主要研究一下flink的MemCheckpointStreamFactory CheckpointStreamFactory flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/state/CheckpointStreamFactory.java /** * A factory for checkpoint output streams, which are used to persist data for checkpoints. * * <p>Stream factories can be created from the {@link CheckpointStorage} through * {@link CheckpointStorage#resolveCheckpointStorageLocation(long, CheckpointStorageLocationReference)}. */ public interface CheckpointStreamFactory { CheckpointStateOutputStream createCheckpointStateOutputStream(CheckpointedStateScope scope) throws

订阅 Apache Flink