Mapping a function over all the letters of a token in python

问题

The purpose of this program is to read in an array of tokens, remove the punctuation, turn all the letters lower case, and then print the resulting array. the readTokens and depunctuateTokens functions both work correctly. My problem is with the decapitalizeTokens function. When I run the program I receive this error:

the name of the program is words.py
['hello', 'hello1', 'hello2']
Traceback (most recent call last):
  File "words.py", line 41, in <module>
    main()    
  File "words.py", line 10, in main
    words = decapitalizeTokens(cleanTokens)
  File "words.py", line 35, in decapitalizeTokens
    if (ord(ch) <= ord('Z')):
TypeError: ord() expected string of length 1, but list found

My question is what formal parameters I should put into the decapitalizeTokens function in order to return the array resulting from the depunctuateTokens function, but with all the letters lowercase.

This is my program:

import sys
from scanner import *
arr=[]
def main():
    print("the name of the program is",sys.argv[0])
    for i in range(1,len(sys.argv),1):
        print("   argument",i,"is", sys.argv[i])
    tokens = readTokens("text.txt")
    cleanTokens = depunctuateTokens(arr)
    words = decapitalizeTokens(cleanTokens)

def readTokens(s):
    s=Scanner("text.txt")
    token=s.readtoken()
    while (token != ""):
        arr.append(token)
        token=s.readtoken()
    s.close()
    return arr

def depunctuateTokens(arr):
    result=[]
    for i in range(0,len(arr),1):
        string=arr[i]
        cleaned=""
        punctuation="""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
        for i in range(0,len(string),1):
            if string[i] not in punctuation:
                cleaned += string[i]
        result.append(cleaned)
    print(result)
    return result

def decapitalizeTokens(result):
    if (ord(result) <= ord('Z')):
        return chr(ord(result) + ord('a') - (ord('A')))
    else:
        print(result)
        return result


main()

回答1:

Your decapitalizeTokens function works on a single character. You're passing it a list of strings. If you want to call it on every character of every string in that list, you need to loop over the list, and then loop over each string, somewhere.

You can do this with explicit loop statements, like this:

words = []
for token in tokens:
    word = ''
    for char in token:
        word += decaptializeTokens(char)
    words += word

… or by using comprehensions:

words = [''.join(decapitalizeTokens(char) for char in token) 
         for token in cleanTokens]

However, I think it would make far more sense to move the loops into the decapitalizeTokens function—both based on its plural name, and on the fact that you have exactly the same loops in the similarly-named depunctuateTokens function. If you build decapitalizeTokens the same way you built depunctuateTokens, then your existing call works fine:

words = decapitalizeTokens(cleanTokens)

As a side note, the built-in lower method on strings already does what you want, so you could replace this whole mess with:

words = [token.lower() for token in cleanTokens]

… which would also fix a nasty bug in your attempt. Consider what, say, decapitalizeTokens would do with a digit or a space.

And, likewise, depunctuateTokens can be similarly replaced by a call to the translate method. For example (slightly different for Python 2.x, but you can read the docs and figure it out):

punctuation="""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
punctmap = {ord(char): None for char in punctuation}
cleanTokens = [token.translate(punctmap) for token in cleanTokens]

回答2:

cleanTokens = depunctuateTokens(...) #returns an array into cleantokens.
words = decapitalizeTokens(cleanTokens) #takes an array and returns... whatever.

the fact is that in

def decapitalizeTokens(result):
    if (ord(result) <= ord('Z')):
        return chr(ord(result) + ord('a') - (ord('A')))
    else:
        print(result)
        return result

result is an array (cleanTokens), and ord(result) fails since it expects a string, and not an array.

perhaps doing words = map(decapitalizeTokens, cleanTokens) can help you

回答3:

import scanner
import string
import sys

def read_tokens(fname):
    res = []
    with scanner.Scanner(fname) as sc:
        tok = sc.readtoken()
        while tok:
            res.append(tok)
            tok = sc.readtoken()
    return res

def depunctuate(s):
    return s.translate(None, string.punctuation)

def decapitalize(s):
    return s.lower()

def main():
    print("The name of the program is {}.".format(sys.argv[0]))
    for arg in enumerate(sys.argv[1:], 1):
        print("  Argument {} is {}".format(i, arg))

    tokens = read_tokens("text.txt")
    clean_tokens = [depunctuate(decapitalize(tok)) for tok in tokens]

if __name__=="__main__":
    main()

来源：https://stackoverflow.com/questions/21842427/mapping-a-function-over-all-the-letters-of-a-token-in-python

标签

python

arrays

token