I have a streaming hadoop project, which I run with amazons AWS cli, i.e "aws emr create-cluster ..."
My input file is 60,000 S3 file names, so the mapper r