I\'m doing a project where i have to work with Pyspark. My input file is about 32GB and due to the fact that I am pretty new to pyspark i\'ve got some Problems with processing a