What does “word for word” syntax mean in Python?

萝らか妹 提交于 2019-12-04 05:12:21

问题


I see the following script snippet from the gensim tutorial page.

What's the syntax of word for word in below Python script?

>> texts = [[word for word in document.lower().split() if word not in stoplist]
>>          for document in documents]

回答1:


This is a list comprehension. The code you posted loops through every element in document.lower.split() and creates a new list that contains only the elements that meet the if condition. It does this for each document in documents.

Try it out...

elems = [1, 2, 3, 4]
squares = [e*e for e in elems]  # square each element
big = [e for e in elems if e > 2]  # keep elements bigger than 2

As you can see from your example, list comprehensions can be nested.




回答2:


That is a list comprehension. An easier example might be:

evens = [num for num in range(100) if num % 2 == 0]



回答3:


I'm quite sure i saw that line in some NLP applications.

This list comprehension:

[[word for word in document.lower().split() if word not in stoplist] for document in documents]

is the same as

ending_list = [] # often known as document stream in NLP.
for document in documents: # Loop through a list.
  internal_list = [] # often known as a a list tokens
  for word in document.lower().split():
    if word not in stoplist:
      internal_list.append(word) # this is where the [[word for word...] ...] appears
  ending_list.append(internal_list)

Basically you want a list of documents that contains a list of tokens. So by looping through the documents,

for document in documents:

you then split each document into tokens

  list_of_tokens = []
  for word in document.lower().split():

and then make a list of of these tokens:

    list_of_tokens.append(word)    

For example:

>>> doc = "This is a foo bar sentence ."
>>> [word for word in doc.lower().split()]
['this', 'is', 'a', 'foo', 'bar', 'sentence', '.']

It's the same as:

>>> doc = "This is a foo bar sentence ."
>>> list_of_tokens = []
>>> for word in doc.lower().split():
...   list_of_tokens.append(word)
... 
>>> list_of_tokens
['this', 'is', 'a', 'foo', 'bar', 'sentence', '.']


来源:https://stackoverflow.com/questions/20953143/what-does-word-for-word-syntax-mean-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!