问题
I want to create a program that reads text from a file and points out when "a" and "an" is used incorrect. The general rule as far as I know is that "an" is used when the next words starts with a vowel. But it should also take into consideration that there are exceptions which also should be read from a file.
Could someone give me some tips and tricks on how I should get started with this. Functions or so that could help.
I would be very glad :-)
I'm quite new to Python.
回答1:
Here's a solution where correctness is defined as: an comes before a word that starts with a vowel sound, otherwise a may be used:
#!/usr/bin/env python
import itertools
import re
import sys
try:
    from future_builtins import map, zip
except ImportError: # Python 3 (or old Python versions)
    map, zip = map, zip
from operator import methodcaller
import nltk  # $ pip install nltk
from nltk.corpus import cmudict  # >>> nltk.download('cmudict')
def starts_with_vowel_sound(word, pronunciations=cmudict.dict()):
    for syllables in pronunciations.get(word, []):
        return syllables[0][-1].isdigit()  # use only the first one
def check_a_an_usage(words):
    # iterate over words pairwise (recipe from itertools)
    #note: ignore Unicode case-folding (`.casefold()`)
    a, b = itertools.tee(map(methodcaller('lower'), words)) 
    next(b, None)
    for a, w in zip(a, b):
        if (a == 'a' or a == 'an') and re.match('\w+$', w): 
            valid = (a == 'an') if starts_with_vowel_sound(w) else (a == 'a')
            yield valid, a, w
#note: you could use nltk to split text in paragraphs,sentences, words
pairs = ((a, w)
         for sentence in sys.stdin.readlines() if sentence.strip() 
         for valid, a, w in check_a_an_usage(nltk.wordpunct_tokenize(sentence))
         if not valid)
print("Invalid indefinite article usage:")
print('\n'.join(map(" ".join, pairs)))
Example input (one sentence per line)
Validity is defined as `an` comes before a word that starts with a vowel sound, otherwise `a` may be used. Like "a house", but "an hour" or "a European" (from @Hyperboreus's comment http://stackoverflow.com/questions/20336524/gramatically-correct-an-english-text-python#comment30353583_20336524 ). A AcRe, an AcRe, a rhYthM, an rhYthM, a yEarlY, an yEarlY (words from @tchrist's comment http://stackoverflow.com/questions/9505714/python-how-to-prepend-the-string-ub-to-every-pronounced-vowel-in-a-string#comment12037821_9505868 ) We have found a (obviously not optimal) solution." vs. "We have found an obvious solution (from @Hyperboreus answer) Wait, I will give you an... -- he shouted, but dropped dead before he could utter the last word. (ditto)
Output
Invalid indefinite article usage:
a acre
an rhythm
an yearly
It is not obvious why the last pair is invalid, see Why is it “an yearly”?
回答2:
Maybe this can give you a rough guideline:
- You need to parse the input text into prosodic units, as I doubt that the rules for "a/an" apply over prosodic boundaries (e.g. "We have found a (obviously not optimal) solution." vs. "We have found an obvious solution"). 
- Next you need to parse each prosodic unit into phonological words. 
- Now you somehow need to identify those words, which represent the undefined article ("a house" vs "grade A product"). 
- Once you have identified the articles, look at the next word in your prosodic unit and determine (here be dragons) the syllabic feature of the first phoneme of this word. 
- If it has [+syll] the article should be "an". If it has [-syll] the article should be "a". If the article is at the end of the prosodic unit, it should be maybe "a" (But what about ellipses: "Wait, I will give you an... -- he shouted, but dropped dead before he could utter the last word."). Except historical exceptions as mentioned by abanert, dialectal variance, etc, etc. 
- If the found article doesn't match the expected, mark this as "incorrect". 
Here some pseudocode:
def parseProsodicUnits(text): #here be dragons
def parsePhonologicalWords(unit): #here be dragons
def isUndefinedArticle(word): #here be dragons
def parsePhonemes(word): #here be dragons
def getFeatures(phoneme): #here be dragons
for unit in parseProsodicUnits(text):
    for idx, word in enumerate (parsePhonologicalWords(unit)[:-1]):
        if not isUndefinedArticle(word): continue
        syllabic = '+syll' in getFeatures(parsePhonemes(unit[idx+1])[0])
        if (word == 'a' and syllabic) or (word == 'an' and not syllabic):
            print ('incorrect')
回答3:
all_words = "this is an wonderful life".split()
for i in range(len(all_words)):
    if all_words[i].lower() in ["a","an"]:
       if all_words[i+1][0].lower() in "aeiou":
           all_words[i] = all_words[i][0]+"n"
       else:
           all_words[i] = all_words[i][0]
print " ".join(all_words)
that should get you started , however it is not a complete solution....
回答4:
I'd probably start with an approach like:
exceptions = set(/*a whole bunch of exceptions*/)
article = None
for word in text.split():
    if article:
        vowel = word[0].lower() in "aeiou"
        if word.lower() in exceptions:
            vowel = not vowel
        if (article.lower() == "an" and not vowel) or (article.lower() == "a" and vowel):
            print "Misused article '%s %s'" % (article, word)
        article = None
    if word.lower() in ('a', 'an'):
       article = word
来源:https://stackoverflow.com/questions/20336524/verify-correct-use-of-a-and-an-in-english-texts-python