Expanding English language contractions in Python

后端 未结 9 1541
野性不改
野性不改 2020-12-04 11:19

The English language has a couple of contractions. For instance:

you\'ve -> you have
he\'s -> he is

These can sometimes cause headach

相关标签:
9条回答
  • 2020-12-04 12:02

    I would like to add little to alko's answer here. If you check wikipedia, the number of English Language contractions as mentioned there are less than 100. Granted, in real scenario this number could be more than that. But still, I am pretty sure that 200-300 words are all you will have for English contraction words. Now, do you want to get a separate library for those (I don't think what you are looking for actually exists, though)?. However, you can easily solve this problem with dictionary and using regex. I would recommend using a nice tokenizer asNatural Language Toolkit and the rest you should have no problem in implementing yourself.

    0 讨论(0)
  • 2020-12-04 12:03

    The answers above will work perfectly well and could be better for ambiguous contraction (although I would argue that there aren't that many ambiguous cases). I would use something more readable and easier to maintain:

    import re
    
    def decontracted(phrase):
        # specific
        phrase = re.sub(r"won\'t", "will not", phrase)
        phrase = re.sub(r"can\'t", "can not", phrase)
    
        # general
        phrase = re.sub(r"n\'t", " not", phrase)
        phrase = re.sub(r"\'re", " are", phrase)
        phrase = re.sub(r"\'s", " is", phrase)
        phrase = re.sub(r"\'d", " would", phrase)
        phrase = re.sub(r"\'ll", " will", phrase)
        phrase = re.sub(r"\'t", " not", phrase)
        phrase = re.sub(r"\'ve", " have", phrase)
        phrase = re.sub(r"\'m", " am", phrase)
        return phrase
    
    
    test = "Hey I'm Yann, how're you and how's it going ? That's interesting: I'd love to hear more about it."
    print(decontracted(test))
    # Hey I am Yann, how are you and how is it going ? That is interesting: I would love to hear more about it.
    

    It might have some flaws I didn't think about though.

    Reposted from my other answer

    0 讨论(0)
  • 2020-12-04 12:05

    The contractions library is indeed great and do take care of a lot of varieties. You can add your own contractions too just by using contractions.add() method.

    Check out the github page here for details.

    0 讨论(0)
提交回复
热议问题