I have folders with many many files (e.g. over 100k), some files small (less than 1kb) and some files big (e.g. several MBs).
I would like to use pyspark and scan all