Apache Storm

spark streaming、flink和storm区别浅析

我的未来我决定 提交于 2019-12-02 07:00:48
1. 介绍 这三个计算框架常常被拿来比较。从我的角度来看,三者的比较可以分为两类(mini-batches vs. streaming)。spark streaming属于微批量的伪流式准实时计算框架(spark本身属于批处理框架)。而flink和storm则作为典型的实时流处理框架。 2. spark vs flink 两者虽然有很多设计实现思路上比较接近以及互相学习,但是主要区别还是mini-batch和streaming的选择上。根据实际场景在吞吐量和实时性上做权衡。 3. flink vs storm 名称 批处理 数据处理保证 api level 容错机制 storm 不支持 at least once(实现采用record-level acknowledgments),Trident可以支持storm 提供exactly once语义 low record-level acknowledgments flink 支持 exactly once(实现采用Chandy-Lamport 算法,即marker-checkpoint ) high marker-checkpoint 4. 其他资料 5. 总结 我写的比较简略。强烈建议看看下面我罗列的参考资料,都写的很不错。 参考资料: What is/are the main difference(s) between

聊聊storm的JoinBolt

久未见 提交于 2019-12-02 06:22:50
序 本文主要研究一下storm的JoinBolt 实例 @Test public void testJoinBolt() throws InvalidTopologyException, AuthorizationException, AlreadyAliveException { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("uuid-spout", new RandomWordSpout(new String[]{"uuid", "timestamp"}), 1); builder.setSpout("word-spout", new RandomWordSpout(new String[]{"word", "timestamp"}), 1); JoinBolt joinBolt = new JoinBolt("uuid-spout", "timestamp") //from priorStream inner join newStream on newStream.field = priorStream.field1 .join("word-spout", "timestamp", "uuid-spout") .select("uuid,word,timestamp")

聊聊storm WindowTridentProcessor的FreshCollector

断了今生、忘了曾经 提交于 2019-12-02 06:22:34
序 本文主要研究一下storm WindowTridentProcessor的FreshCollector 实例 TridentTopology topology = new TridentTopology(); topology.newStream("spout1", spout) .partitionBy(new Fields("user")) .window(windowConfig,windowsStoreFactory,new Fields("user","score"),new UserCountAggregator(),new Fields("aggData")) .parallelismHint(1) .each(new Fields("aggData"), new PrintEachFunc(),new Fields()); 这个实例在window操作之后跟了一个each操作 WindowTridentProcessor storm-core-1.2.2-sources.jar!/org/apache/storm/trident/windowing/WindowTridentProcessor.java public class WindowTridentProcessor implements TridentProcessor { private

聊聊storm的reportError

戏子无情 提交于 2019-12-02 06:22:18
序 本文主要研究一下storm的reportError IErrorReporter storm-2.0.0/storm-client/src/jvm/org/apache/storm/task/IErrorReporter.java public interface IErrorReporter { void reportError(Throwable error); } ISpoutOutputCollector、IOutputCollector、IBasicOutputCollector接口均继承了IErrorReporter接口 ISpoutOutputCollector storm-core/1.2.2/storm-core-1.2.2-sources.jar!/org/apache/storm/spout/ISpoutOutputCollector.java public interface ISpoutOutputCollector extends IErrorReporter{ /** Returns the task ids that received the tuples. */ List<Integer> emit(String streamId, List<Object> tuple, Object messageId); void emitDirect

聊聊storm TridentWindowManager的pendingTriggers

我怕爱的太早我们不能终老 提交于 2019-12-02 06:22:03
序 本文主要研究一下storm TridentWindowManager的pendingTriggers TridentBoltExecutor.finishBatch storm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/TridentBoltExecutor.java private boolean finishBatch(TrackedBatch tracked, Tuple finishTuple) { boolean success = true; try { _bolt.finishBatch(tracked.info); String stream = COORD_STREAM(tracked.info.batchGroup); for(Integer task: tracked.condition.targetTasks) { _collector.emitDirect(task, stream, finishTuple, new Values(tracked.info.batchId, Utils.get(tracked.taskEmittedTuples, task, 0))); } if(tracked.delayedAck!=null) { _collector.ack(tracked

聊聊storm的WindowedBoltExecutor

旧街凉风 提交于 2019-12-02 06:21:49
序 本文主要研究一下storm的WindowedBoltExecutor WindowedBoltExecutor storm-2.0.0/storm-client/src/jvm/org/apache/storm/topology/WindowedBoltExecutor.java /** * An {@link IWindowedBolt} wrapper that does the windowing of tuples. */ public class WindowedBoltExecutor implements IRichBolt { public static final String LATE_TUPLE_FIELD = "late_tuple"; private static final Logger LOG = LoggerFactory.getLogger(WindowedBoltExecutor.class); private static final int DEFAULT_WATERMARK_EVENT_INTERVAL_MS = 1000; // 1s private static final int DEFAULT_MAX_LAG_MS = 0; // no lag private final IWindowedBolt bolt; //

聊聊storm的OpaquePartitionedTridentSpoutExecutor

感情迁移 提交于 2019-12-02 06:21:30
序 本文主要研究一下storm的OpaquePartitionedTridentSpoutExecutor TridentTopology.newStream storm-core-1.2.2-sources.jar!/org/apache/storm/trident/TridentTopology.java public Stream newStream(String txId, IOpaquePartitionedTridentSpout spout) { return newStream(txId, new OpaquePartitionedTridentSpoutExecutor(spout)); } TridentTopology.newStream方法,对于IOpaquePartitionedTridentSpout类型的spout会使用OpaquePartitionedTridentSpoutExecutor来包装;而KafkaTridentSpoutOpaque则实现了IOpaquePartitionedTridentSpout接口 TridentTopologyBuilder.buildTopology storm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology

聊聊storm的ICommitterTridentSpout

戏子无情 提交于 2019-12-02 06:21:07
序 本文主要研究一下storm的ICommitterTridentSpout ICommitterTridentSpout storm-core-1.2.2-sources.jar!/org/apache/storm/trident/spout/ICommitterTridentSpout.java public interface ICommitterTridentSpout<X> extends ITridentSpout<X> { public interface Emitter extends ITridentSpout.Emitter { void commit(TransactionAttempt attempt); } @Override public Emitter getEmitter(String txStateId, Map conf, TopologyContext context); } ICommitterTridentSpout继承了ITridentSpout,主要是对getEmitter方法进行覆盖,返回扩展的Emitter,它继承ITridentSpout.Emitter ,多定义了一个commit接口 TridentTopologyBuilder.buildTopology storm-core-1.2.2-sources.jar!/org

聊聊storm TridentBoltExecutor的finishBatch方法

冷暖自知 提交于 2019-12-02 06:20:51
序 本文主要研究一下storm TridentBoltExecutor的finishBatch方法 MasterBatchCoordinator.nextTuple storm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/MasterBatchCoordinator.java public void nextTuple() { sync(); } private void sync() { // note that sometimes the tuples active may be less than max_spout_pending, e.g. // max_spout_pending = 3 // tx 1, 2, 3 active, tx 2 is acked. there won't be a commit for tx 2 (because tx 1 isn't committed yet), // and there won't be a batch for tx 4 because there's max_spout_pending tx active TransactionStatus maybeCommit = _activeTx.get(_currTransaction); if

聊聊storm的maxSpoutPending

你。 提交于 2019-12-02 06:20:38
序 本文主要研究一下storm的maxSpoutPending TOPOLOGY_MAX_SPOUT_PENDING storm-2.0.0/storm-client/src/jvm/org/apache/storm/Config.java /** * The maximum number of tuples that can be pending on a spout task at any given time. This config applies to individual tasks, not to * spouts or topologies as a whole. * * A pending tuple is one that has been emitted from a spout but has not been acked or failed yet. Note that this config parameter has * no effect for unreliable spouts that don't tag their tuples with a message id. */ @isInteger @isPositiveNumber public static final String TOPOLOGY_MAX_SPOUT_PENDING =