NLTK word_tokenize on French text is not woking properly
问题 I'm trying to use NLTK word_tokenize on a text in French by using : txt = ["Le télétravail n'aura pas d'effet sur ma vie"] print(word_tokenize(txt,language='french')) it should print: ['Le', 'télétravail', 'n'','aura', 'pas', 'd'','effet', 'sur', 'ma', 'vie','.'] But I get: ['Le', 'télétravail', "n'aura", 'pas', "d'effet", 'sur', 'ma', 'vie','.'] Does anyone know why it's not spliting tokens properly in French and how to overcome this (and other potential issues) when doing NLP in French? 回答1