Tokenization of Arabic words using NLTK

后端 未结 2 1276
栀梦
栀梦 2020-12-28 16:36

I\'m using NLTK word_tokenizer to split a sentence into words.

I want to tokenize this sentence:

في_بيتنا كل شي لما تحت         


        
2条回答
  •  [愿得一人]
    2020-12-28 16:43

    I always recommend using nltk.tokenize.wordpunct_tokenize. You can try out many of the NLTK tokenizers at http://text-processing.com/demo/tokenize/ and see for yourself.

提交回复
热议问题