问题
The purpose of this program is to read in an array of tokens, remove the punctuation, turn all the letters lower case, and then print the resulting array. the readTokens and depunctuateTokens functions both work correctly. My problem is with the decapitalizeTokens function. When I run the program I receive this error:
the name of the program is words.py
['hello', 'hello1', 'hello2']
Traceback (most recent call last):
File "words.py", line 41, in <module>
main()
File "words.py", line 10, in main
words = decapitalizeTokens(cleanTokens)
File "words.py", line 35, in decapitalizeTokens
if (ord(ch) <= ord('Z')):
TypeError: ord() expected string of length 1, but list found
My question is what formal parameters I should put into the decapitalizeTokens function in order to return the array resulting from the depunctuateTokens function, but with all the letters lowercase.
This is my program:
import sys
from scanner import *
arr=[]
def main():
print("the name of the program is",sys.argv[0])
for i in range(1,len(sys.argv),1):
print(" argument",i,"is", sys.argv[i])
tokens = readTokens("text.txt")
cleanTokens = depunctuateTokens(arr)
words = decapitalizeTokens(cleanTokens)
def readTokens(s):
s=Scanner("text.txt")
token=s.readtoken()
while (token != ""):
arr.append(token)
token=s.readtoken()
s.close()
return arr
def depunctuateTokens(arr):
result=[]
for i in range(0,len(arr),1):
string=arr[i]
cleaned=""
punctuation="""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
for i in range(0,len(string),1):
if string[i] not in punctuation:
cleaned += string[i]
result.append(cleaned)
print(result)
return result
def decapitalizeTokens(result):
if (ord(result) <= ord('Z')):
return chr(ord(result) + ord('a') - (ord('A')))
else:
print(result)
return result
main()
回答1:
Your decapitalizeTokens function works on a single character. You're passing it a list of strings. If you want to call it on every character of every string in that list, you need to loop over the list, and then loop over each string, somewhere.
You can do this with explicit loop statements, like this:
words = []
for token in tokens:
word = ''
for char in token:
word += decaptializeTokens(char)
words += word
… or by using comprehensions:
words = [''.join(decapitalizeTokens(char) for char in token)
for token in cleanTokens]
However, I think it would make far more sense to move the loops into the decapitalizeTokens function—both based on its plural name, and on the fact that you have exactly the same loops in the similarly-named depunctuateTokens function. If you build decapitalizeTokens the same way you built depunctuateTokens, then your existing call works fine:
words = decapitalizeTokens(cleanTokens)
As a side note, the built-in lower method on strings already does what you want, so you could replace this whole mess with:
words = [token.lower() for token in cleanTokens]
… which would also fix a nasty bug in your attempt. Consider what, say, decapitalizeTokens would do with a digit or a space.
And, likewise, depunctuateTokens can be similarly replaced by a call to the translate method. For example (slightly different for Python 2.x, but you can read the docs and figure it out):
punctuation="""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
punctmap = {ord(char): None for char in punctuation}
cleanTokens = [token.translate(punctmap) for token in cleanTokens]
回答2:
cleanTokens = depunctuateTokens(...) #returns an array into cleantokens.
words = decapitalizeTokens(cleanTokens) #takes an array and returns... whatever.
the fact is that in
def decapitalizeTokens(result):
if (ord(result) <= ord('Z')):
return chr(ord(result) + ord('a') - (ord('A')))
else:
print(result)
return result
result is an array (cleanTokens), and ord(result) fails since it expects a string, and not an array.
perhaps doing words = map(decapitalizeTokens, cleanTokens) can help you
回答3:
import scanner
import string
import sys
def read_tokens(fname):
res = []
with scanner.Scanner(fname) as sc:
tok = sc.readtoken()
while tok:
res.append(tok)
tok = sc.readtoken()
return res
def depunctuate(s):
return s.translate(None, string.punctuation)
def decapitalize(s):
return s.lower()
def main():
print("The name of the program is {}.".format(sys.argv[0]))
for arg in enumerate(sys.argv[1:], 1):
print(" Argument {} is {}".format(i, arg))
tokens = read_tokens("text.txt")
clean_tokens = [depunctuate(decapitalize(tok)) for tok in tokens]
if __name__=="__main__":
main()
来源:https://stackoverflow.com/questions/21842427/mapping-a-function-over-all-the-letters-of-a-token-in-python