I am trying to generate molecular descriptors using RDKit and then perform machine learning on them all using Spark. I have managed to generate the descriptors and I have fo