In creating my own implementation of a Bag-of-Words model I am trying to compute text similarity. I have successfully taken a corpus of documents, vectorized them and stored