This is a spark streaming program written in scala. It counts the number of words from a socket in every 1 second. The result would be the word count, for example, the word coun
I have a very simple answer and its just few lines of code. you can find this is most of the spark books. remember that I have used localhost and port 9999.
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
sc = SparkContext(appName="PythonStreamingNetworkWordCount")
ssc = StreamingContext(sc, 1)
lines = ssc.socketTextStream("localhost", 9999)
counts = lines.flatMap(lambda line: line.split(" "))\
.map(lambda word: (word, 1))\
.reduceByKey(lambda a, b: a+b)
counts.pprint()
ssc.start()
ssc.awaitTermination()
and to stop you can use a simple
ssc.stop()
This is a very basic code but this code is helpful in a basic understanding of spark streaming, Dstream to be more specific.
to give input to the localhost in your terminal (Mac terminal) type
nc -l 9999
so it would listen to everything you type after that and the words would be counted
Hope this is helpful.