Why do CoreNLP ner tagger and ner tagger join the separated numbers together?
问题 Here is the code snippet: In [390]: t Out[390]: ['my', 'phone', 'number', 'is', '1111', '1111', '1111'] In [391]: ner_tagger.tag(t) Out[391]: [('my', 'O'), ('phone', 'O'), ('number', 'O'), ('is', 'O'), ('1111\xa01111\xa01111', 'NUMBER')] What I expect is: Out[391]: [('my', 'O'), ('phone', 'O'), ('number', 'O'), ('is', 'O'), ('1111', 'NUMBER'), ('1111', 'NUMBER'), ('1111', 'NUMBER')] As you can see the artificial phone number is joined by \xa0 which is said to be a non-breaking space. Can I