I am working on a large audio dataset, involving 1000s of .wav files, 3 seconds in length, stored in multiple folders depending on their associated label. The first stage of my