So I have to read a binary file into Pyspark, need to separate the file using "~" and filter the results, keeping the lines start with \'\\x10\' and has length greater