How do I tokenize a string sentence in NLTK?

后端 未结 2 1160
鱼传尺愫
鱼传尺愫 2020-11-29 06:58

I am using nltk, so I want to create my own custom texts just like the default ones on nltk.books. However, I\'ve just got up to the method like

my_text = [\         


        
2条回答
  •  孤独总比滥情好
    2020-11-29 07:30

    As @PavelAnossov answered, the canonical answer, use the word_tokenize function in nltk:

    from nltk import word_tokenize
    sent = "This is my text, this is a nice way to input text."
    word_tokenize(sent)
    

    If your sentence is truly simple enough:

    Using the string.punctuation set, remove punctuation then split using the whitespace delimiter:

    import string
    x = "This is my text, this is a nice way to input text."
    y = "".join([i for i in x if not in string.punctuation]).split(" ")
    print y
    

提交回复
热议问题