Apache Flink

聊聊flink的ScheduledExecutor

我的未来我决定 提交于 2019-12-04 10:04:23
序 本文主要研究一下flink的ScheduledExecutor Executor java.base/java/util/concurrent/Executor.java public interface Executor { /** * Executes the given command at some time in the future. The command * may execute in a new thread, in a pooled thread, or in the calling * thread, at the discretion of the {@code Executor} implementation. * * @param command the runnable task * @throws RejectedExecutionException if this task cannot be * accepted for execution * @throws NullPointerException if command is null */ void execute(Runnable command); } jdk的Executor接口定义了execute方法,接收参数类型为Runnable ScheduledExecutor

聊聊flink的RichParallelSourceFunction

让人想犯罪 __ 提交于 2019-12-04 10:04:11
序 本文主要研究一下flink的RichParallelSourceFunction RichParallelSourceFunction /** * Base class for implementing a parallel data source. Upon execution, the runtime will * execute as many parallel instances of this function function as configured parallelism * of the source. * * <p>The data source has access to context information (such as the number of parallel * instances of the source, and which parallel instance the current instance is) * via {@link #getRuntimeContext()}. It also provides additional life-cycle methods * ({@link #open(org.apache.flink.configuration.Configuration)} and {@link #close(

聊聊flink的Parallel Execution

大兔子大兔子 提交于 2019-12-04 10:03:55
序 本文主要研究一下flink的Parallel Execution 实例 Operator Level final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream<String> text = [...] DataStream<Tuple2<String, Integer>> wordCounts = text .flatMap(new LineSplitter()) .keyBy(0) .timeWindow(Time.seconds(5)) .sum(1).setParallelism(5); wordCounts.print(); env.execute("Word Count Example"); operators、data sources、data sinks都可以调用setParallelism()方法来设置parallelism Execution Environment Level final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(3);

聊聊flink KeyedStream的intervalJoin操作

跟風遠走 提交于 2019-12-03 16:50:23
序 本文主要研究一下flink KeyedStream的intervalJoin操作 实例 DataStream<Integer> orangeStream = ... DataStream<Integer> greenStream = ... orangeStream .keyBy(<KeySelector>) .intervalJoin(greenStream.keyBy(<KeySelector>)) .between(Time.milliseconds(-2), Time.milliseconds(1)) .process (new ProcessJoinFunction<Integer, Integer, String(){ @Override public void processElement(Integer left, Integer right, Context ctx, Collector<String> out) { out.collect(first + "," + second); } }); KeyedStream.intervalJoin flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/datastream/KeyedStream.java

聊聊flink的CheckpointedFunction

和自甴很熟 提交于 2019-12-03 15:24:00
序 本文主要研究一下flink的CheckpointedFunction 实例 public class BufferingSink implements SinkFunction<Tuple2<String, Integer>>, CheckpointedFunction { private final int threshold; private transient ListState<Tuple2<String, Integer>> checkpointedState; private List<Tuple2<String, Integer>> bufferedElements; public BufferingSink(int threshold) { this.threshold = threshold; this.bufferedElements = new ArrayList<>(); } @Override public void invoke(Tuple2<String, Integer> value) throws Exception { bufferedElements.add(value); if (bufferedElements.size() == threshold) { for (Tuple2<String, Integer> element:

聊聊flink KeyedStream的reduce操作

*爱你&永不变心* 提交于 2019-12-03 15:23:48
序 本文主要研究一下flink KeyedStream的reduce操作 实例 @Test public void testWordCount() throws Exception { // Checking input parameters // final ParameterTool params = ParameterTool.fromArgs(args); // set up the execution environment final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); // make parameters available in the web interface // env.getConfig().setGlobalJobParameters(params); // get input data DataStream<String> text = env.fromElements(WORDS); DataStream<Tuple2<String, Integer>> counts = // split up the lines in pairs (2-tuples) containing: (word,1) text

聊聊flink的RestClusterClientConfiguration

放肆的年华 提交于 2019-12-03 15:23:33
序 本文主要研究一下flink的RestClusterClientConfiguration RestClusterClientConfiguration flink-release-1.7.2/flink-clients/src/main/java/org/apache/flink/client/program/rest/RestClusterClientConfiguration.java public final class RestClusterClientConfiguration { private final RestClientConfiguration restClientConfiguration; private final long awaitLeaderTimeout; private final int retryMaxAttempts; private final long retryDelay; private RestClusterClientConfiguration( final RestClientConfiguration endpointConfiguration, final long awaitLeaderTimeout, final int retryMaxAttempts, final long retryDelay) {

聊聊flink的MemoryStateBackend

与世无争的帅哥 提交于 2019-12-03 15:23:17
序 本文主要研究一下flink的MemoryStateBackend StateBackend flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/state/StateBackend.java @PublicEvolving public interface StateBackend extends java.io.Serializable { // ------------------------------------------------------------------------ // Checkpoint storage - the durable persistence of checkpoint data // ------------------------------------------------------------------------ /** * Resolves the given pointer to a checkpoint/savepoint into a checkpoint location. The location * supports reading the checkpoint metadata, or disposing the

聊聊flink taskmanager的data.port与rpc.port

心已入冬 提交于 2019-12-03 15:23:04
序 本文主要研究一下flink taskmanager的data.port与rpc.port TaskManagerServices flink-release-1.7.2/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerServices.java public class TaskManagerServices { //...... public static TaskManagerServices fromConfiguration( TaskManagerServicesConfiguration taskManagerServicesConfiguration, ResourceID resourceID, Executor taskIOExecutor, long freeHeapMemoryWithDefrag, long maxJvmHeapMemory) throws Exception { // pre-start checks checkTempDirs(taskManagerServicesConfiguration.getTmpDirPaths()); final NetworkEnvironment network =

聊聊flink TaskManager的managed memory

亡梦爱人 提交于 2019-12-03 15:22:51
序 本文主要研究一下flink TaskManager的managed memory TaskManagerOptions flink-core-1.7.2-sources.jar!/org/apache/flink/configuration/TaskManagerOptions.java @PublicEvolving public class TaskManagerOptions { //...... /** * JVM heap size for the TaskManagers with memory size. */ @Documentation.CommonOption(position = Documentation.CommonOption.POSITION_MEMORY) public static final ConfigOption<String> TASK_MANAGER_HEAP_MEMORY = key("taskmanager.heap.size") .defaultValue("1024m") .withDescription("JVM heap size for the TaskManagers, which are the parallel workers of" + " the system. On YARN setups, this