Expanding English language contractions in Python

后端 未结 9 1543
野性不改
野性不改 2020-12-04 11:19

The English language has a couple of contractions. For instance:

you\'ve -> you have
he\'s -> he is

These can sometimes cause headach

9条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-04 11:48

    Even though this is an old question, I figured I might as well answer since there is still no real solution to this as far as I can see.

    I have had to work on this on a related NLP project and I decided to tackle the problem since there didn't seem to be anything here. You can check my expander github repository if you are interested.

    It's a fairly badly optimized (I think) program based on NLTK, the Stanford Core NLP models, which you will have to download separately, and the dictionary in the previous answer. All the necessary information should be in the README and the lavishly commented code. I know commented code is dead code, but this is just how I write to keep things clear for myself.

    The example input in expander.py are the following sentences:

        ["I won't let you get away with that",  # won't ->  will not
        "I'm a bad person",  # 'm -> am
        "It's his cat anyway",  # 's -> is
        "It's not what you think",  # 's -> is
        "It's a man's world",  # 's -> is and 's possessive
        "Catherine's been thinking about it",  # 's -> has
        "It'll be done",  # 'll -> will
        "Who'd've thought!",  # 'd -> would, 've -> have
        "She said she'd go.",  # she'd -> she would
        "She said she'd gone.",  # she'd -> had
        "Y'all'd've a great time, wouldn't it be so cold!", # Y'all'd've -> You all would have, wouldn't -> would not
        " My name is Jack.",   # No replacements.
        "'Tis questionable whether Ma'am should be going.", # 'Tis -> it is, Ma'am -> madam
        "As history tells, 'twas the night before Christmas.", # 'Twas -> It was
        "Martha, Peter and Christine've been indulging in a menage-à-trois."] # 've -> have
    

    To which the output is

        ["I will not let you get away with that",
        "I am a bad person",
        "It is his cat anyway",
        "It is not what you think",
        "It is a man's world",
        "Catherine has been thinking about it",
        "It will be done",
        "Who would have thought!",
        "She said she would go.",
        "She said she had gone.",
        "You all would have a great time, would not it be so cold!",
        "My name is Jack.",
        "It is questionable whether Madam should be going.",
        "As history tells, it was the night before Christmas.",
        "Martha, Peter and Christine have been indulging in a menage-à-trois."]
    

    So for this small set of test sentences, I came up with to test some edge-cases, it works well.

    Since this project has lost importance right now, I am not actively developing this anymore. Any help on this project would be appreciated. Things to be done are written in the TODO list. Or if you have any tips on how to improve my python I would also be very thankful.

提交回复
热议问题