Lowercase first element of tuple in list of tuples

你。 提交于 2019-12-13 14:19:06

问题


I have a list of documents, labeled with their appropriate categories:

documents = [(list(corpus.words(fileid)), category)
              for category in corpus.categories()
              for fileid in corpus.fileids(category)]

which gives me the following list of tuples, where the first element of the tuple is a list of words (tokens of a sentence). For instance:

[([u'A', u'pilot', u'investigation', u'of', u'a', u'multidisciplinary', 
u'quality', u'of', u'life', u'intervention', u'for', u'men', u'with', 
u'biochemical', u'recurrence', u'of', u'prostate', u'cancer', u'.'], 
'cancer'), 
([u'A', u'Systematic', u'Review', u'of', u'the', u'Effectiveness', 
u'of', u'Medical', u'Cannabis', u'for', u'Psychiatric', u',', 
u'Movement', u'and', u'Neurodegenerative', u'Disorders', u'.'], 'hd')]

I want to apply some text-processing techniques, but I wish to maintain the list of tuples format.

I know that if I had only a list of words, this would do:

[w.lower() for w in words]

But in this case, I want to apply .lower() to the first element (list of strings) of every tuple in the tuples list, and after trying various options like:

[[x.lower() for x in element] for element in documents],
[(x.lower(), y) for x,y in documents], or
[x[0].lower() for x in documents]

I always get this error:

AttributeError: 'list' object has no attribute 'lower'

I have also tried applying what I need before creating the list, but .categories() and .fileids() are properties of corpus and they also return the same error (they're lists as well).

Any help would be deeply appreciated.

SOLVED:

both @Adam Smith's answer and @vasia were right:

[([s.lower() for s in item[0]], item[1]) for item in documents]

@Adam's answer above maintains the tuple structure; @vasia does the trick right from the creation of the list of tuples:

documents = [([word.lower() for word in corpus.words(fileid)], category)
              for category in corpus.categories()
              for fileid in corpus.fileids(category)]

Thank you all :)


回答1:


so your data structure is [([str], str)]. A list of tuples where each tuple is (list of strings, string). It's important to deeply understand what that means before you try to pull data out of it.

That means that for item in documents will get you a list of tuples, where item is each tuple.

That means that item[0] is the list in each tuple.

That means that for item in documents: for s in item[0]: will iterate through each string inside that list. Let's try that!

[s.lower() for item in documents for s in item[0]]

This should give, from your example data:

[u'a', u'p', u'i', u'o', u'a', u'm', ...]

If you're trying to keep the tuple format, you could do:

[([s.lower() for s in item[0]], item[1]) for item in documents]

# or perhaps more readably
[([s.lower() for s in lst], val) for lst, val in documents]

Both these statements give:

[([u'a', u'p', u'i', u'o', u'a', u'm', ...], 'cancer'), ... ]



回答2:


You are close. You are looking for a construction like this:

[([s.lower() for s in ls], cat) for ls, cat in documents]

Which essentially puts these two together:

[[x.lower() for x in element] for element in documents],
[(x.lower(), y) for x,y in documents]



回答3:


Try this:

documents = [([word.lower() for word in corpus.words(fileid)], category)
              for category in corpus.categories()
              for fileid in corpus.fileids(category)]



回答4:


Normally, tuples are immutable. However, since your first element of each tuple is a list, that list is mutable, so you can modify its contents without changing the tuple ownership of that list:

documents = [(...what you originally posted...) ... etc. ...]

for d in documents:
    # to lowercase all strings in the list
    # trailing '[:]' is important, need to modify list in place using slice
    d[0][:] = [w.lower() for w in d[0]]

    # or to just lower-case the first element of the list (which is what you asked for)
    d[0][0] = d[0][0].lower()

You can't just call lower() on a string and have it get updated - lower() returns a new string. So to modify the string to be the lowercased version, you have to assign over it. This would not be possible if the string were itself a tuple member, but since the string you are modifying is in a list in the tuple, you can modify the list contents without modifying the tuple's ownership of the list.



来源:https://stackoverflow.com/questions/47042431/lowercase-first-element-of-tuple-in-list-of-tuples

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!