I am using pyspark in aws emr to read 100k small json files which are published by kafka s3sink connector from MySQL database. Using the following snippet:
ug