I\'m writing some code where I take a bunch of documents that have subject codes attached to them, and then run a CountVectorizer over them, to generate similarities with a