I have found a bunch of question about shuffling dataset but non of them imply why just shuffling file names does not suffice. Following is the code for loading the dataset