I get OutOfMemoryError when processing tar.gz files greater than 1gb in spark.
To get past this error I have tried splitting the tar.gz into multiple