Soundex algorithm in Python (homework help request)

前端 未结 3 484
不知归路
不知归路 2020-12-11 14:12

The US census bureau uses a special encoding called “soundex” to locate information about a person. The soundex is an encoding of surnames (last names) based on the way a su

3条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-11 14:37

    This is hardly perfect (for instance, it produces the wrong result if the input doesn't start with a letter), and it doesn't implement the rules as independently-testable functions, so it's not really going to serve as an answer to the homework question. But this is how I'd implement it:

    >>> def soundex_prepare(s):
            """Prepare string for Soundex encoding.
    
            Remove non-alpha characters (and the not-of-interest W/H/Y), 
            convert to upper case, and remove all runs of repeated letters."""
            p = re.compile("[^a-gi-vxz]", re.IGNORECASE)
            s = re.sub(p, "", s).upper()
            for c in set(s):
                s = re.sub(c + "{2,}", c, s)
            return s
    
    >>> def soundex_encode(s):
            """Encode a name string using the Soundex algorithm."""
            result = s[0].upper()
            s = soundex_prepare(s[1:])
            letters = 'ABCDEFGIJKLMNOPQRSTUVXZ'
            codes   = '.123.12.22455.12623.122'
            d = dict(zip(letters, codes))
            prev_code=""
            for c in s:
                code = d[c]
                if code != "." and code != prev_code:
                    result += code
             if len(result) >= 4: break
                prev_code = code
            return (result + "0000")[:4]
    

提交回复
热议问题