Problem with indexes in enumerate() - Python [closed]

徘徊边缘 提交于 2020-01-07 09:24:21

问题


I have a dataset (a_list_of_sentences) in the form of a list of lists of lists, where the smaller list consist in a word and its syntactic dependency, and these lists are joined into sentences, like this:

[[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']],
 [['mary', 'nsubj'], ['loves', 'ROOT'], ['all', 'det'], ['men', 'dobj']],
 [['all', 'det'], ['students', 'nsubj'], ['love', 'ROOT'], ['mary', 'dobj']]]

I want to find the sentences in which there is a quantifier (e.g. 'every', 'all') followed by a word whose syntactic dependency is subject ('nsubj') or object ('dobj') and distinguish between these two cases. For my purposes, the subject or the object could be either the first word after a quantifier or the second word after a quantifier. I tried to do that using enumerate(), in this way:

for sentence in a_list_of_sentences:
    for i, j in enumerate(sentence):
            if "dobj" in sentence[i]:
                if "all" in sentence[i-1] or "all" in sentence[i-2] or "every" in sentence[i-1] or "every" in sentence[i-2]:
                    print(sentence, "dobj")
            elif "nsubj" in sentence[i]:
                if "all" in sentence[i-1] or "all" in sentence[i-2] or "every" in sentence[i-1] or "every" in sentence[i-2]:
                    print(sentence, "nsubj")

However, this code returns as quantifiers in both subject and object position the quantifiers in object position, because I get sentences like [['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']] in the two print output:

[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']] nsubj
[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']] dobj

Do you know what I am doing wrong and how I can fix it?

Thank you very much!!!


回答1:


The problem is that list slice indexes can be negative (if they didn't, you'd get IndexError). It's kind of a wrap around at (both) list ends.
Check [SO]: Understanding slice notation for more details.
Below is a cleaner variant.

code00.py:

#!/usr/bin/env python3

import sys


def main(*argv):
    sentences = [
        [["mary", "nsubj"], ["loves", "ROOT"], ["every", "det"], ["man", "dobj"]],
        [["mary", "nsubj"], ["loves", "ROOT"], ["all", "det"], ["men", "dobj"]],
        [["all", "det"], ["students", "nsubj"], ["love", "ROOT"], ["mary", "dobj"]],
    ]
    quantifiers = ["all", "every"]
    syntactic_roles = ["nsubj", "dobj"]

    for sentence in sentences:
        #print(sentence)
        quantifier_idx = -1
        for idx, (word, syntactic_role) in enumerate(sentence):
            if quantifier_idx > -1 and idx - quantifier_idx in [1, 2] and syntactic_role in syntactic_roles:
                print(" ".join(item[0] for item in sentence) + " - " + syntactic_role)
                break
            if word in quantifiers:
                quantifier_idx = idx


if __name__ == "__main__":
    print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    main(*sys.argv[1:])
    print("\nDone.")

Output:

e:\Work\Dev\StackOverflow\q059500488>"c:\Install\pc064\Python\Python\03.08.01\python.exe" code00.py
Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 23:11:46) [MSC v.1916 64 bit (AMD64)] 64bit on win32

mary loves every man - dobj
mary loves all men - dobj
all students love mary - nsubj

Done.



回答2:


You can use negative indexes in lists. The below example will print 'c'.

mylist = ['a', 'b', 'c']
print(mylist[-1])

So if we take your first argument:

[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']]

It will first print on first word of the sentence on the elif statement since:

  • mary is a nsubj
  • and sentence[i-2], results in sentence[-2], which equals to 'every'

Now, it will also print on the last word of the sentence on the if statement since:

  • man is a dobj
  • and sentence[i-1], results in sentence[2], which equals to 'every'

I suggest that you look forward instead of backward, for instance with the following code:

quantifiers = ['every', 'all']
for sentence in a_list_of_sentences:
    max_index = len(sentence) - 1
    for word_index, word in enumerate(sentence):
        if word[0] in quantifiers:
            if max_index > word_index:
                if sentence[word_index+1][1] in 'nsubj':
                    print(sentence, "nsubj")
                elif sentence[word_index+1][1] in 'dobj':
                    print(sentence, "dobj")
            if max_index > word_index + 1:
                if sentence[word_index+2][1] in 'nsubj':
                    print(sentence, "nsubj")
                elif sentence[word_index+2][1] in 'dobj':
                    print(sentence, "dobj")

At last, I have a small remark about how you use the index.

In your code, instead of:

for i, j in enumerate(sentence):
        if "dobj" in sentence[i]:

You could do:

for i, j in enumerate(sentence):
        if "dobj" in j:


来源:https://stackoverflow.com/questions/59500488/problem-with-indexes-in-enumerate-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!