Tokenize words in a list of sentences Python

后端 未结 7 1551
广开言路
广开言路 2021-02-04 06:41

i currently have a file that contains a list that is looks like

example = [\'Mary had a little lamb\' , 
           \'Jack went up the hill\' , 
           \'Ji         


        
相关标签:
7条回答
  • 2021-02-04 07:29

    You could use the word tokenizer in NLTK (http://nltk.org/api/nltk.tokenize.html) with a list comprehension, see http://docs.python.org/2/tutorial/datastructures.html#list-comprehensions

    >>> from nltk.tokenize import word_tokenize
    >>> example = ['Mary had a little lamb' , 
    ...            'Jack went up the hill' , 
    ...            'Jill followed suit' ,    
    ...            'i woke up suddenly' ,
    ...            'it was a really bad dream...']
    >>> tokenized_sents = [word_tokenize(i) for i in example]
    >>> for i in tokenized_sents:
    ...     print i
    ... 
    ['Mary', 'had', 'a', 'little', 'lamb']
    ['Jack', 'went', 'up', 'the', 'hill']
    ['Jill', 'followed', 'suit']
    ['i', 'woke', 'up', 'suddenly']
    ['it', 'was', 'a', 'really', 'bad', 'dream', '...']
    
    0 讨论(0)
提交回复
热议问题