Apache Storm

聊聊storm的tickTuple

不羁岁月 提交于 2019-12-04 04:58:21
序 本文主要研究一下storm的tickTuple 实例 TickWordCountBolt public class TickWordCountBolt extends BaseBasicBolt { private static final Logger LOGGER = LoggerFactory.getLogger(TickWordCountBolt.class); Map<String, Integer> counts = new HashMap<String, Integer>(); @Override public Map<String, Object> getComponentConfiguration() { Config conf = new Config(); conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 10); return conf; } @Override public void execute(Tuple input, BasicOutputCollector collector) { if(TupleUtils.isTick(input)){ //execute tick logic LOGGER.info("execute tick tuple, emit and clear counts");

Spark Stream、Kafka Stream、Storm和Flink对比,以及阿里巴巴基于Flink打造的Blink解决的问题

浪子不回头ぞ 提交于 2019-12-03 14:38:31
一、Spark Stream、Kafka Stream、Storm等存在的问题 在设计一个低延迟、exactly once、流和批统一的,能够支撑足够大体量的复杂计算的引擎时,Spark Stream等的劣势就显现出来。Spark Streaming的本质还是一个基于microbatch计算的引擎。这种引擎一个天生的缺点就是每个microbatch的调度开销比较大,当我们要求的延迟越低,额外的开销就越大。这就导致了Spark Streaming实际上不是特别适合于做秒级甚至亚秒级的计算。 Kafka Streaming是从一个日志系统做起来的,它的设计目标是足够轻量,足够简洁易用。这一点很难满足我们对大体量的复杂计算的需求。 Storm是一个没有批处理能力的数据流处理器,除此之外Storm只提供了非常底层的API,用户需要自己实现很多复杂的逻辑。 二、Flink的优势 (1)不同于Spark,Flink是一个真正意义上的流计算引擎,和Storm类似,Flink是通过流水线数据传输实现低延迟的流处理; (2)Flink使用了经典的Chandy-Lamport算法,能够在满足低延迟和低failover开销的基础之上,完美地解决exactly once的目标; (3)如果用一套引擎来统一流处理和批处理,那就必须以流处理引擎为基础。Flink还提供了SQL/tableAPI这两个API

Apache 流框架 Flink,Spark Streaming,Storm对比分析

只谈情不闲聊 提交于 2019-12-03 14:38:18
1.Flink架构及特性分析 Flink是个相当早的项目,开始于2008年,但只在最近才得到注意。Flink是原生的流处理系统,提供high level的API。Flink也提供 API来像Spark一样进行批处理,但两者处理的基础是完全不同的。Flink把批处理当作流处理中的一种特殊情况。在Flink中,所有 的数据都看作流,是一种很好的抽象,因为这更接近于现实世界。 1.1 基本架构 下面我们介绍下Flink的基本架构,Flink系统的架构与Spark类似,是一个基于Master-Slave风格的架构。 当 Flink 集群启动后,首先会启动一个 JobManger 和一个或多个的 TaskManager。由 Client 提交任务给 JobManager, JobManager 再调度任务到各个 TaskManager 去执行,然后 TaskManager 将心跳和统计信息汇报给 JobManager。 TaskManager 之间以流的形式进行数据的传输。上述三者均为独立的 JVM 进程。 Client 为提交 Job 的客户端,可以是运行在任何机器上(与 JobManager 环境连通即可)。提交 Job 后,Client 可以结束进程 (Streaming的任务),也可以不结束并等待结果返回。 JobManager 主要负责调度 Job 并协调 Task 做

聊聊storm的WindowedBolt

纵然是瞬间 提交于 2019-12-03 13:41:37
序 本文主要研究一下storm的WindowedBolt 实例 @Test public void testSlidingTupleTsTopology() throws InvalidTopologyException, AuthorizationException, AlreadyAliveException { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("integer", new RandomIntegerSpout(), 1); BaseWindowedBolt baseWindowedBolt = new SlidingWindowSumBolt() //windowLength , slidingInterval .withWindow(new BaseWindowedBolt.Duration(5, TimeUnit.SECONDS), new BaseWindowedBolt.Duration(3, TimeUnit.SECONDS)) //通过withTimestampField指定tuple的某个字段作为这个tuple的timestamp .withTimestampField("timestamp") //输入流中最新的元组时间戳的最小值减去Lag值=watermark

聊聊storm的AggregateProcessor的execute及finishBatch方法

二次信任 提交于 2019-12-03 13:41:02
序 本文主要研究一下storm的AggregateProcessor的execute及finishBatch方法 实例 TridentTopology topology = new TridentTopology(); topology.newStream("spout1", spout) .groupBy(new Fields("user")) .aggregate(new Fields("user","score"),new UserCountAggregator(),new Fields("val")) .toStream() .parallelismHint(1) .each(new Fields("val"),new PrintEachFunc(),new Fields()); TridentBoltExecutor storm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/TridentBoltExecutor.java private void checkFinish(TrackedBatch tracked, Tuple tuple, TupleType type) { if(tracked.failed) { failBatch(tracked); _collector.fail(tuple);

聊聊storm的AssignmentDistributionService

◇◆丶佛笑我妖孽 提交于 2019-12-03 13:40:39
序 本文主要研究一下storm的AssignmentDistributionService AssignmentDistributionService storm-2.0.0/storm-server/src/main/java/org/apache/storm/nimbus/AssignmentDistributionService.java /** * A service for distributing master assignments to supervisors, this service makes the assignments notification * asynchronous. * * <p>We support multiple working threads to distribute assignment, every thread has a queue buffer. * * <p>Master will shuffle its node request to the queues, if the target queue is full, we just discard the request, * let the supervisors sync instead. * * <p>Caution: this class is not

聊聊storm的CustomStreamGrouping

房东的猫 提交于 2019-12-03 13:40:21
序 本文主要研究一下storm的CustomStreamGrouping CustomStreamGrouping storm-2.0.0/storm-client/src/jvm/org/apache/storm/grouping/CustomStreamGrouping.java public interface CustomStreamGrouping extends Serializable { /** * Tells the stream grouping at runtime the tasks in the target bolt. This information should be used in chooseTasks to determine the * target tasks. * * It also tells the grouping the metadata on the stream this grouping will be used on. */ void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks); /** * This function implements a custom stream grouping.

聊聊storm的IWaitStrategy

↘锁芯ラ 提交于 2019-12-03 13:32:59
序 本文主要研究一下storm的IWaitStrategy IWaitStrategy storm-2.0.0/storm-client/src/jvm/org/apache/storm/policy/IWaitStrategy.java public interface IWaitStrategy { static IWaitStrategy createBackPressureWaitStrategy(Map<String, Object> topologyConf) { IWaitStrategy producerWaitStrategy = ReflectionUtils.newInstance((String) topologyConf.get(Config.TOPOLOGY_BACKPRESSURE_WAIT_STRATEGY)); producerWaitStrategy.prepare(topologyConf, WAIT_SITUATION.BACK_PRESSURE_WAIT); return producerWaitStrategy; } void prepare(Map<String, Object> conf, WAIT_SITUATION waitSituation); /** * Implementations of this method

聊聊flink如何兼容StormTopology

◇◆丶佛笑我妖孽 提交于 2019-12-03 08:06:45
序 本文主要研究一下flink如何兼容StormTopology 实例 @Test public void testStormWordCount() throws Exception { //NOTE 1 build Topology the Storm way final TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new RandomWordSpout(), 1); builder.setBolt("count", new WordCountBolt(), 5) .fieldsGrouping("spout", new Fields("word")); builder.setBolt("print", new PrintBolt(), 1) .shuffleGrouping("count"); //NOTE 2 convert StormTopology to FlinkTopology FlinkTopology flinkTopology = FlinkTopology.createTopology(builder); //NOTE 3 execute program locally using FlinkLocalCluster Config conf = new

聊聊storm worker的executor与task

老子叫甜甜 提交于 2019-12-03 01:58:20
序 本文主要研究一下storm worker的executor与task Worker storm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java public static void main(String[] args) throws Exception { Preconditions.checkArgument(args.length == 5, "Illegal number of arguments. Expected: 5, Actual: " + args.length); String stormId = args[0]; String assignmentId = args[1]; String supervisorPort = args[2]; String portStr = args[3]; String workerId = args[4]; Map<String, Object> conf = ConfigUtils.readStormConfig(); Utils.setupDefaultUncaughtExceptionHandler(); StormCommon.validateDistributedMode(conf); Worker worker =