How to speed up spaCy lemmatization?
问题 I'm using spaCy (version 2.0.11) for lemmatization in the first step of my NLP pipeline but unfortunately it's taking a verrry long time. It is clearly the slowest part of my processing pipeline and I want to know if there are improvements I could be making. I am using a pipeline as: nlp.pipe(docs_generator, batch_size=200, n_threads=6, disable=['ner']) on a 8 core machine, and I have verified that the machine is using all the cores. On a corpus of about 3 million short texts totaling almost