storm

基于Flume+Log4j+Kafka的日志采集架构方案(上)

老子叫甜甜 提交于 2019-12-03 11:48:49
Flume是一个完善、强大的日志采集工具,关于它的配置,在网上有很多现成的例子和资料,这里仅做简单说明不再详细赘述。 Flume包含Source、Channel、Sink三个最基本的概念: Source——日志来源,其中包括:Avro Source、Thrift Source、Exec Source、JMS Source、Spooling Directory Source、Kafka Source、NetCat Source、Sequence Generator Source、Syslog Source、HTTP Source、Stress Source、Legacy Source、Custom Source、Scribe Source以及Twitter 1% firehose Source。 Channel——日志管道,所有从Source过来的日志数据都会以队列的形式存放在里面,它包括:Memory Channel、JDBC Channel、Kafka Channel、File Channel、Spillable Memory Channel、Pseudo Transaction Channel、Custom Channel。 Sink——日志出口,日志将通过Sink向外发射,它包括:HDFS Sink、Hive Sink、Logger Sink、Avro Sink、Thrift

Storm On Yarn 安装部署

纵然是瞬间 提交于 2019-12-03 11:27:56
1. 安装 JDK7 和 Maven 2. 部署Hadoop2集群,并启动yarn http://my.oschina.net/zc741520/blog/362824 3. 下载 Storm on Yarn [grid@hadoop4 ~]$ wget https://github.com/yahoo/storm-yarn/archive/master.zip 4. 编译 [grid@hadoop4 ~]$ unzip master.zip [grid@hadoop4 ~]$ cd storm-yarn-master ## 修改 pom.xml,将Hadoop的版本号改成对应的版本号 [grid@hadoop4 storm-yarn-master]$ vim pom.xml <properties> <storm.version>0.9.0-wip21</storm.version> <hadoop.version>2.5.2</hadoop.version> <!--hadoop.version>2.1.0.2.0.5.0-67</hadoop.version--> </properties> ## 编译 [grid@hadoop4 storm-yarn-master]$ mvn package -DskipTests 5. storm-yarn-master/lib

Apache Storm Installation without ZeroMQ/JZMQ

匿名 (未验证) 提交于 2019-12-03 10:24:21
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 由 翻译 强力驱动 问题: I am trying to setup a multi-cluster storm system. I have found several 3rd party step by step guides on this. They all have Java, Python, ZeroMQ 2.1.7 and JZMQ as the requirements for the Nimbus and Supervisor/Slave nodes. But on the official Apache Storm website, the only requirements for the Nimbus and Supervisor nodes is Java 6 and Python 2.6.6 ( https://storm.apache.org/documentation/Setting-up-a-Storm-cluster.html ) Does anyone know if ZeroMQ and JZMQ are required for Storm cluster configuration? And is there an advantage to

Storm组件介绍

坚强是说给别人听的谎言 提交于 2019-12-03 09:11:57
(1)Topologies 拓扑 解释: 拓扑类似一个集装箱,所有的货物都会存储在集装箱里面最后被托运走,storm里面所有的代码和文件最终会被打包在一个拓扑中,然后提交在storm集群中运行,类似于Hadoop中的一个MapReduce的作业,最大的区别在于MapReduce最终会主动停止,Storm的Topologies不会主动停止,除非你强制kill掉它 相关拓展: TopologyBuilder : Java里面构造Topology工具类 生产模式 Config conf = new Config(); conf.setNumWorkers(20); conf.setMaxSpoutPending(5000); StormSubmitter.submitTopology("mytopology", conf, topology); 本地模式 import org.apache.storm.LocalCluster; LocalCluster cluster = new LocalCluster(); (2)Streams 数据流 Stream是Storm里面的核心抽象模型,在分布式环境下一个数据流是由无限的tuple序列组成,这些通过数据源并行的源源不断的被创建出来,Stream的schema是由一个字段名标识,值类型可以是integer,long,shot,bytes

Twitter Storm, 数据流分组策略,fieldsGrouping

南楼画角 提交于 2019-12-03 09:11:44
##Storm Grouping shuffleGrouping 将流分组定义为混排。这种混排分组意味着来自Spout的输入将混排,或随机分发给此Bolt中的任务。shuffle grouping对各个task的tuple分配的比较均匀。 fieldsGrouping 这种grouping机制保证相同field值的tuple会去同一个task,这对于WordCount来说非常关键,如果同一个单词不去同一个task,那么统计出来的单词次数就不对了。 All grouping 广播发送, 对于每一个tuple将会复制到每一个bolt中处理。 Global grouping Stream中的所有的tuple都会发送给同一个bolt任务处理,所有的tuple将会发送给拥有最小task_id的bolt任务处理。 None grouping 不关注并行处理负载均衡策略时使用该方式,目前等同于shuffle grouping,另外storm将会把bolt任务和他的上游提供数据的任务安排在同一个线程下。 Direct grouping 由tuple的发射单元直接决定tuple将发射给那个bolt,一般情况下是由接收tuple的bolt决定接收哪个bolt发射的Tuple。这是一种比较特别的分组方法,用这种分组意味着消息的发送者指定由消息接收者的哪个task处理这个消息。 只有被声明为Direct

Apache Storm Nimbus Error: Could not find or load main class

匿名 (未验证) 提交于 2019-12-03 08:59:04
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm getting the following error trying so start storm nimbus for a local dev Windows 7 workstation: Error: Could not find or load main class Files\Java\jdk1.8.0_92\bin;C:\Program storm.yaml: storm.zookeeper.servers: - "127.0.0.1" nimbus.seeds: ["127.0.0.1"] storm.local.dir: "C:\\Users\\userX\\Apps\\ApacheStorm\\apache-storm-1.0.1\\data" supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703 I successfully started ZooKeeper. But when I try to run: storm nimbus from the Storm bin folder I get the error. Any ideas? 回答1: Use PROGRA~1 instead of

java.lang.ClassNotFoundException: kafka.api.OffsetRequest

匿名 (未验证) 提交于 2019-12-03 08:48:34
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I am getting error java.lang.ClassNotFoundException kafka.api.OffsetRequest while trying integrate Kafka to our Storm topology. What versions you are running and it is working? My pom.xml <dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-core</artifactId> <version>0.9.2-incubating</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-kafka</artifactId> <version>0.9.2-incubating</version> </dependency> 回答1: Finally I solved it by implementing my own Kafka SPOUT

storm-starter with intellij idea,maven project could not find class

匿名 (未验证) 提交于 2019-12-03 03:10:03
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm beginner of storm and intellij idea,when I import storm-starter(apache-storm-0.9.5.zip) to intellij idea(14 CE OS),everything is OK,but when I run the "ExclamationTopology" ,a problem appears as follow: Exception in thread "main" java.lang.NoClassDefFoundError: backtype/storm/topology/IRichSpout at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:122) Caused by: java.lang.ClassNotFoundException: backtype.storm.topology.IRichSpout at java

How to run WordCountTopology from storm-starter in Intellij

匿名 (未验证) 提交于 2019-12-03 02:26:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I work with Storm for a while already, but want to get started with development. As suggested, I am using IntelliJ (up to now, I was using Eclipse and did only write topologies against Java API). I was also looking at https://github.com/apache/storm/tree/master/examples/storm-starter#intellij-idea This documentation is not complete. I was not able to run anything in Intellij first. I could figure out, that I need to remove the scope of storm-core dependency (in storm-starter pom.xml). (found here: storm-starter with intellij idea,maven

Streamparse wordcount example

匿名 (未验证) 提交于 2019-12-03 01:48:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I have been wanting to use Apache Storm to stream from Kafka. I am more comfortable with Python, so I decided to use streamparse ( https://github.com/Parsely/streamparse ). The word count example is the introductory example. I have been trying to get it to work on my local machine. I have the following version of JDK, lein and storm installed: Leiningen 2.6.1 on Java 1.8.0_73 Java HotSpot(TM) 64-Bit Server VM I run the following steps after following streamparse: sparse quick start wordcount cd wordcount sparse run I get the following error: