We have an internal tool which runs on the EMR cluster, and the code can written in Scala, SQL, etc, to process large amounts of data and publish the results to AWS S3 or an