Simple spell checking algorithm

后端 未结 4 1234
星月不相逢
星月不相逢 2021-02-01 10:21

I\'ve been tasked with creating a simple spell checker for an assignment but have given next to no guidance so was wondering if anyone could help me out. I\'m not after someone

4条回答
  •  被撕碎了的回忆
    2021-02-01 11:19

    You should have a look at this explanation of Peter Norvig on how to write a spelling corrector .

    How to write a spelling corrector

    EveryThing is well explain in this article, as an example the python code for the spell checker looks like this :

    import re, collections
    
    def words(text): return re.findall('[a-z]+', text.lower()) 
    
    def train(features):
        model = collections.defaultdict(lambda: 1)
        for f in features:
            model[f] += 1
        return model
    
    NWORDS = train(words(file('big.txt').read()))
    
    alphabet = 'abcdefghijklmnopqrstuvwxyz'
    
    def edits1(word):
       splits     = [(word[:i], word[i:]) for i in range(len(word) + 1)]
       deletes    = [a + b[1:] for a, b in splits if b]
       transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
       replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]
       inserts    = [a + c + b     for a, b in splits for c in alphabet]
       return set(deletes + transposes + replaces + inserts)
    
    def known_edits2(word):
        return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)
    
    def known(words): return set(w for w in words if w in NWORDS)
    
    def correct(word):
        candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
        return max(candidates, key=NWORDS.get)
    

    Hope you can find what you need on Peter Norvig website.

提交回复
热议问题