I have one csv file in a folder that is keep on updating continuously. I need to take inputs from this csv file and produce some transactions. How can I take data from the c
Firstly, I'm not sure how you arrive here, because a csv file should be written sequencially, which is able to achieve a better Input/Output. So my recommendation is that you create an append-only file, and try to get the stream data like getting data from binlog.
However if you have to do this, I think StreamingContext may help you.
val ssc = new StreamingContext(new SparkConf(), Durations.milliseconds(1))
val fileStream = ssc.fileStream[LongWritable, Text, TextInputFormat]("/tmp", (x: Path) => true, newFilesOnly = false).map(_._2.toString)
I have 1 csv file in 1 folder location that is keep on updating everytime. i need to take inputs from this csv file and produce some transactions. how can i take data from csv file that is keep on updating , lets say every 5 minutes.
tl;dr It won't work.
Spark Structured Streaming by default monitors files in a directory and for every new file triggers a computation. Once a file has been processed, the file will never be processed again. That's the default implementation.
You could write your own streaming source that could monitor a file for changes, but that's a custom source development (which in most cases is not worth the effort yet doable).