Spark streaming textFileStream and fileStream can monitor a directory and process the new files in a Dstream RDD.
How to get the file names
Alternatively, by modifying FileInputDStream so that rather than loading the contents of the files into the RDD, it simply creates an RDD from the filenames.
This gives a performance boost if you don't actually want to read the data itself into the RDD, or want to pass filenames to an external command as one of your steps.
Simply change filesToRDD(..) so that it makes an RDD of the filenames, rather than loading the data into the RDD.
See: https://github.com/HASTE-project/bin-packing-paper/blob/master/spark/spark-scala-cellprofiler/src/main/scala/FileInputDStream2.scala#L278