问题
Problem Statement: Filter those words from the complete set of text6, having first letter in upper case and all other letters in lower case. Store the result in variable title_words. print the number of words present in title_words.
I have tried every possible ways to find the answer but don't know where I am lagging.
import nltk
from nltk.book import text6
title_words = 0
for item in set(text6):
if item[0].isupper() and item[1:].islower():
title_words += 1
print(title_words)
I have tried in this way as well:
title_words = 0
for item in text6:
if item[0].isupper() and item[1:].islower():
title_words += 1
print(title_words)
I am not sure how many count its required, whatever the count is coming its not allowing me to pass the challenge. Please let me know if I am doing anything wrong in this code
回答1:
I think the problem is with set(text6)
. I suggest you iterate over text6.tokens
.
Update, explanation
The code you've provided is correct.
The issues is that the text can contain same words multiple times. Doing a set(words)
will reduce the total available words, so you start with an incomplete data set.
The other responses are not necessary wrong in checking the validity of a word, but they are iterating over the same wrong data set.
回答2:
One of the above suggestions did work for me. Sample code below.
title_words = [word for word in text6 if (len(word)==1 and word[0].isupper()) or (word[0].isupper() and word[1:].islower()) ]
print(len(title_words))
回答3:
In the question, "Store the result in variable title_words. print the number of words present in title_words."
The result of filtering a list of elements is a list of the same type of elements. In your case, filtering the list text6
(assuming it's a list of strings) would result in a (smaller) list of strings. Your title_words
variable should be this filtered list, not the number of strings; the number of strings would just be the length of the list.
It's also ambiguous from the question if capitalized words should be filtered out (ie. removed from the smaller list) or filtered (ie. kept in the list), so try out both to see if you're interpreting it incorrectly.
回答4:
Give regular expressions a try:
>>> import re
>>> from nltk.book import text6
>>>
>>> text = ' '.join(set(text6))
>>> title_words = re.findall(r'([A-Z]{1}[a-z]+)', text)
>>> len(title_words)
461
回答5:
There are 50 singleton elements (elements of length one) in text6, however, your code would not pass any as a success, like, 'I' or 'W' etc. Is that correct, or do you require words of minimum length 2?
回答6:
Just few changes according to what the question asks.
from nltk.book import text6
title_words = []
for item in set(text6):
if item[0].isupper() and item[1:].islower():
title_words.append(item)
print(len(title_words))
来源:https://stackoverflow.com/questions/55438634/how-to-find-a-word-first-letter-will-be-capital-other-will-be-lower