ssc | 易学教程

194 用Spark Streaming实现实时WordCount

阅读更多关于 194 用Spark Streaming实现实时WordCount

架构图： 1.安装并启动生成者首先在一台Linux（ip：192.168.10.101）上用YUM安装nc工具 yum install -y nc 启动一个服务端并监听9999端口 nc -lk 9999 2.编写Spark Streaming程序 package cn.itcast.spark.streaming import cn.itcast.spark.util.LoggerLevel import org.apache.spark.SparkConf import org.apache.spark.streaming.{Seconds, StreamingContext} object NetworkWordCount { def main(args: Array[String]) { //设置日志级别 LoggerLevel.setStreamingLogLevels() //创建SparkConf并设置为本地模式运行 //注意local[2]代表开两个线程 val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount") //设置DStream批次时间间隔为2秒 val ssc = new StreamingContext(conf, Seconds(2)) /

SparkStreaming基础案例

阅读更多关于 SparkStreaming基础案例

WordCount 案例案例一： import org.apache.spark.streaming._ val ssc = new StreamingContext(sc,Seconds(5)); val lines = ssc.textFileStream("file:///home/software/stream"); //val lines = ssc.textFileStream("hdfs://hadoop01:9000/wordcount"); val words = lines.flatMap(_.split(" ")); val wordCounts = words.map((_,1)).reduceByKey(_+_); wordCounts.print(); ssc.start(); 基本概念 1. StreamingContext StreamingContext是Spark Streaming编程的最基本环境对象，就像Spark编程中的SparkContext一样。StreamingContext提供最基本的功能入口，包括从各途径创建最基本的对象DStream（就像Spark编程中的RDD）。创建StreamingContext的方法很简单，生成一个SparkConf实例，设置程序名，指定运行周期（示例中是5秒），这样就可以了： val conf =

SparkStreaming与Kafka整合

阅读更多关于 SparkStreaming与Kafka整合

代码示例： import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.streaming.StreamingContext import org.apache.spark.streaming.Seconds import org.apache.spark.streaming.kafka.KafkaUtils object Driver { def main(args: Array[String]): Unit = { //-- 启动线程数，至少是两个。一个线程用于监听数据源，其他线程用于消费或打印。至少是 2 个 val conf=new SparkConf().setMaster("local[5]").setAppName("kafkainput") val sc=new SparkContext(conf) val ssc=new StreamingContext(sc,Seconds(5)) ssc.checkpoint("d://check1801") //-- 连接 kafka, 并消费数据 val zkHosts="192.168.150.137:2181,192.168.150.138:2181,192.168.150.139:2181