Spark Streaming Accumulated Word Count

前端 未结 2 908
臣服心动
臣服心动 2021-02-14 13:58

This is a spark streaming program written in scala. It counts the number of words from a socket in every 1 second. The result would be the word count, for example, the word coun

2条回答
  •  天命终不由人
    2021-02-14 14:05

    I have a very simple answer and its just few lines of code. you can find this is most of the spark books. remember that I have used localhost and port 9999.

    from pyspark import SparkContext
    from pyspark.streaming import StreamingContext
    
    sc = SparkContext(appName="PythonStreamingNetworkWordCount")
    ssc = StreamingContext(sc, 1)
    lines = ssc.socketTextStream("localhost", 9999)
    counts = lines.flatMap(lambda line: line.split(" "))\
                         .map(lambda word: (word, 1))\
                         .reduceByKey(lambda a, b: a+b)
    counts.pprint()
    ssc.start()
    ssc.awaitTermination()
    

    and to stop you can use a simple

    ssc.stop()

    This is a very basic code but this code is helpful in a basic understanding of spark streaming, Dstream to be more specific.

    to give input to the localhost in your terminal (Mac terminal) type

    nc -l 9999

    so it would listen to everything you type after that and the words would be counted

    Hope this is helpful.

提交回复
热议问题