I have been trying to make a pyspark plagiarism detector with the help of this: https://github.com/goldshtn/spark-workshop/blob/master/python/lab7-plagiarism.md/
I am