storm | 易学教程

基于Flume+Log4j+Kafka的日志采集架构方案（上）

阅读更多关于基于Flume+Log4j+Kafka的日志采集架构方案（上）

Flume是一个完善、强大的日志采集工具，关于它的配置，在网上有很多现成的例子和资料，这里仅做简单说明不再详细赘述。 Flume包含Source、Channel、Sink三个最基本的概念： Source——日志来源，其中包括：Avro Source、Thrift Source、Exec Source、JMS Source、Spooling Directory Source、Kafka Source、NetCat Source、Sequence Generator Source、Syslog Source、HTTP Source、Stress Source、Legacy Source、Custom Source、Scribe Source以及Twitter 1% firehose Source。 Channel——日志管道，所有从Source过来的日志数据都会以队列的形式存放在里面，它包括：Memory Channel、JDBC Channel、Kafka Channel、File Channel、Spillable Memory Channel、Pseudo Transaction Channel、Custom Channel。 Sink——日志出口，日志将通过Sink向外发射，它包括：HDFS Sink、Hive Sink、Logger Sink、Avro Sink、Thrift

Storm On Yarn 安装部署

阅读更多关于 Storm On Yarn 安装部署

1. 安装 JDK7 和 Maven 2. 部署Hadoop2集群，并启动yarn http://my.oschina.net/zc741520/blog/362824 3. 下载 Storm on Yarn [grid@hadoop4 ~]$ wget https://github.com/yahoo/storm-yarn/archive/master.zip 4. 编译 [grid@hadoop4 ~]$ unzip master.zip [grid@hadoop4 ~]$ cd storm-yarn-master ## 修改 pom.xml，将Hadoop的版本号改成对应的版本号 [grid@hadoop4 storm-yarn-master]$ vim pom.xml <properties> <storm.version>0.9.0-wip21</storm.version> <hadoop.version>2.5.2</hadoop.version>  </properties> ## 编译 [grid@hadoop4 storm-yarn-master]$ mvn package -DskipTests 5. storm-yarn-master/lib

Apache Storm Installation without ZeroMQ/JZMQ

阅读更多关于 Apache Storm Installation without ZeroMQ/JZMQ

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 由翻译强力驱动问题: I am trying to setup a multi-cluster storm system. I have found several 3rd party step by step guides on this. They all have Java, Python, ZeroMQ 2.1.7 and JZMQ as the requirements for the Nimbus and Supervisor/Slave nodes. But on the official Apache Storm website, the only requirements for the Nimbus and Supervisor nodes is Java 6 and Python 2.6.6 ( https://storm.apache.org/documentation/Setting-up-a-Storm-cluster.html ) Does anyone know if ZeroMQ and JZMQ are required for Storm cluster configuration? And is there an advantage to

Storm组件介绍

阅读更多关于 Storm组件介绍

（1）Topologies 拓扑解释：拓扑类似一个集装箱，所有的货物都会存储在集装箱里面最后被托运走，storm里面所有的代码和文件最终会被打包在一个拓扑中，然后提交在storm集群中运行，类似于Hadoop中的一个MapReduce的作业，最大的区别在于MapReduce最终会主动停止，Storm的Topologies不会主动停止，除非你强制kill掉它相关拓展： TopologyBuilder ： Java里面构造Topology工具类生产模式 Config conf = new Config(); conf.setNumWorkers(20); conf.setMaxSpoutPending(5000); StormSubmitter.submitTopology("mytopology", conf, topology); 本地模式 import org.apache.storm.LocalCluster; LocalCluster cluster = new LocalCluster(); （2）Streams 数据流 Stream是Storm里面的核心抽象模型，在分布式环境下一个数据流是由无限的tuple序列组成，这些通过数据源并行的源源不断的被创建出来，Stream的schema是由一个字段名标识，值类型可以是integer,long,shot,bytes

Twitter Storm, 数据流分组策略，fieldsGrouping

阅读更多关于 Twitter Storm, 数据流分组策略，fieldsGrouping

##Storm Grouping shuffleGrouping 将流分组定义为混排。这种混排分组意味着来自Spout的输入将混排，或随机分发给此Bolt中的任务。shuffle grouping对各个task的tuple分配的比较均匀。 fieldsGrouping 这种grouping机制保证相同field值的tuple会去同一个task，这对于WordCount来说非常关键，如果同一个单词不去同一个task，那么统计出来的单词次数就不对了。 All grouping 广播发送，对于每一个tuple将会复制到每一个bolt中处理。 Global grouping Stream中的所有的tuple都会发送给同一个bolt任务处理，所有的tuple将会发送给拥有最小task_id的bolt任务处理。 None grouping 不关注并行处理负载均衡策略时使用该方式，目前等同于shuffle grouping,另外storm将会把bolt任务和他的上游提供数据的任务安排在同一个线程下。 Direct grouping 由tuple的发射单元直接决定tuple将发射给那个bolt，一般情况下是由接收tuple的bolt决定接收哪个bolt发射的Tuple。这是一种比较特别的分组方法，用这种分组意味着消息的发送者指定由消息接收者的哪个task处理这个消息。只有被声明为Direct

Apache Storm Nimbus Error: Could not find or load main class

阅读更多关于 Apache Storm Nimbus Error: Could not find or load main class

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I'm getting the following error trying so start storm nimbus for a local dev Windows 7 workstation: Error: Could not find or load main class Files\Java\jdk1.8.0_92\bin;C:\Program storm.yaml: storm.zookeeper.servers: - "127.0.0.1" nimbus.seeds: ["127.0.0.1"] storm.local.dir: "C:\\Users\\userX\\Apps\\ApacheStorm\\apache-storm-1.0.1\\data" supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703 I successfully started ZooKeeper. But when I try to run: storm nimbus from the Storm bin folder I get the error. Any ideas? 回答1: Use PROGRA~1 instead of

java.lang.ClassNotFoundException: kafka.api.OffsetRequest

阅读更多关于 java.lang.ClassNotFoundException: kafka.api.OffsetRequest

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I am getting error java.lang.ClassNotFoundException kafka.api.OffsetRequest while trying integrate Kafka to our Storm topology. What versions you are running and it is working? My pom.xml <dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-core</artifactId> <version>0.9.2-incubating</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-kafka</artifactId> <version>0.9.2-incubating</version> </dependency> 回答1: Finally I solved it by implementing my own Kafka SPOUT

storm-starter with intellij idea,maven project could not find class

阅读更多关于 storm-starter with intellij idea,maven project could not find class

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I'm beginner of storm and intellij idea,when I import storm-starter(apache-storm-0.9.5.zip) to intellij idea(14 CE OS),everything is OK,but when I run the "ExclamationTopology" ,a problem appears as follow: Exception in thread "main" java.lang.NoClassDefFoundError: backtype/storm/topology/IRichSpout at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:122) Caused by: java.lang.ClassNotFoundException: backtype.storm.topology.IRichSpout at java

How to run WordCountTopology from storm-starter in Intellij

阅读更多关于 How to run WordCountTopology from storm-starter in Intellij

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I work with Storm for a while already, but want to get started with development. As suggested, I am using IntelliJ (up to now, I was using Eclipse and did only write topologies against Java API). I was also looking at https://github.com/apache/storm/tree/master/examples/storm-starter#intellij-idea This documentation is not complete. I was not able to run anything in Intellij first. I could figure out, that I need to remove the scope of storm-core dependency (in storm-starter pom.xml). (found here: storm-starter with intellij idea,maven

Streamparse wordcount example

阅读更多关于 Streamparse wordcount example

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I have been wanting to use Apache Storm to stream from Kafka. I am more comfortable with Python, so I decided to use streamparse ( https://github.com/Parsely/streamparse ). The word count example is the introductory example. I have been trying to get it to work on my local machine. I have the following version of JDK, lein and storm installed: Leiningen 2.6.1 on Java 1.8.0_73 Java HotSpot(TM) 64-Bit Server VM I run the following steps after following streamparse: sparse quick start wordcount cd wordcount sparse run I get the following error:

订阅 storm