Python: How to prepend the string 'ub' to every pronounced vowel in a string?

前端 未结 3 1751
-上瘾入骨i
-上瘾入骨i 2020-12-06 12:52

Example: Speak -> Spubeak, more info here

Don\'t give me a solution, but point me in the right direction or tell which which python library I could

3条回答
  •  广开言路
    2020-12-06 13:48

    It is more complex then just a simple regex e.g.,

    "Hi, how are you?" → "Hubi, hubow ubare yubou?"
    

    Simple regex won't catch that e is not pronounced in are.

    You need a library that provides a pronunciation dictionary such as nltk.corpus.cmudict:

    from nltk.corpus import cmudict # $ pip install nltk
    # $ python -c "import nltk; nltk.download('cmudict')"
    
    def spubeak(word, pronunciations=cmudict.dict()):
        istitle = word.istitle() # remember, to preserve titlecase
        w = word.lower() #note: ignore Unicode case-folding
        for syllables in pronunciations.get(w, []):
            parts = []
            for syl in syllables:
                if syl[:1] == syl[1:2]:
                    syl = syl[1:] # remove duplicate
                isvowel = syl[-1].isdigit()
                # pronounce the word
                parts.append('ub'+syl[:-1] if isvowel else syl)
            result = ''.join(map(str.lower, parts))
            return result.title() if istitle else result
        return word # word not found in the dictionary
    

    Example:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    import re
    
    sent = "Hi, how are you?"
    subent = " ".join(["".join(map(spubeak, re.split("(\W+)", nonblank)))
                       for nonblank in sent.split()])
    print('"{}" → "{}"'.format(sent, subent))
    

    Output

    "Hi, how are you?" → "Hubay, hubaw ubar yubuw?"

    Note: It is different from the first example: each word is replaced with its syllables.

提交回复
热议问题