I have a spark job running on EMR. I have an operation that loads the data and computes some statistics before passing the data onto subsequent steps. I call a .persis
.persis