Python: How to prepend the string 'ub' to every pronounced vowel in a string?

北战南征 提交于 2019-12-17 16:37:13

问题


Example: Speak -> Spubeak, more info here

Don't give me a solution, but point me in the right direction or tell which which python library I could use? I am thinking of regex since I have to find a vowel, but then which method could I use to insert 'ub' in front of a vowel?


回答1:


It is more complex then just a simple regex e.g.,

"Hi, how are you?" → "Hubi, hubow ubare yubou?"

Simple regex won't catch that e is not pronounced in are.

You need a library that provides a pronunciation dictionary such as nltk.corpus.cmudict:

from nltk.corpus import cmudict # $ pip install nltk
# $ python -c "import nltk; nltk.download('cmudict')"

def spubeak(word, pronunciations=cmudict.dict()):
    istitle = word.istitle() # remember, to preserve titlecase
    w = word.lower() #note: ignore Unicode case-folding
    for syllables in pronunciations.get(w, []):
        parts = []
        for syl in syllables:
            if syl[:1] == syl[1:2]:
                syl = syl[1:] # remove duplicate
            isvowel = syl[-1].isdigit()
            # pronounce the word
            parts.append('ub'+syl[:-1] if isvowel else syl)
        result = ''.join(map(str.lower, parts))
        return result.title() if istitle else result
    return word # word not found in the dictionary

Example:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import re

sent = "Hi, how are you?"
subent = " ".join(["".join(map(spubeak, re.split("(\W+)", nonblank)))
                   for nonblank in sent.split()])
print('"{}" → "{}"'.format(sent, subent))

Output

"Hi, how are you?" → "Hubay, hubaw ubar yubuw?"

Note: It is different from the first example: each word is replaced with its syllables.




回答2:


You can use regular expressions for substitutions. See re.sub.

Example:

>>> import re
>>> re.sub(r'(e)', r'ub\1', 'speak')
'spubeak'

You will need to read the documentation for regex groups and so on. You will also need to figure out how to match different vowels instead of just the one in the example.

For some great ideas (and code) for using regular expressions in Python for a pronunciation dictionary, take a look at this link, which is one of the design pages for the Cainteoir project: http://rhdunn.github.com/cainteoir/rules.html

Cainteoir's text-to-speech rule engine design (which is not fully implemented yet) uses regular expressions. See also Pronunciation Dictionaries and Regexes, another article by the Cainteoir author.




回答3:


Regular expressions are really the best route. If you are unsure of how to proceed, check how capturing groups work, and how you can include them in your substitutions.



来源:https://stackoverflow.com/questions/9505714/python-how-to-prepend-the-string-ub-to-every-pronounced-vowel-in-a-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!