how to automatically detect acronym meaning / extension

你。 提交于 2019-12-04 10:01:28

问题


How can you detect / find out the meaning (the extension) of an acronym using NLP / Information Extraction (IE) methods?

We want to detect in free text if a word or it's acronym is used and map it to the same entity / token.

Most papers available online are about medical acronyms and they do not provide a library for acomplish this task.

Any ideas?


回答1:


Reading your question and the comments I understand that you want to create a mapping from an acronym to its extension.

Assuming you have a collection of textual documents where both the acronym and its expansion occur you can apply an algorithm to extract (acronym,extension) pairs.

A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text by A.S Schwartz and M.A. Hearst, does exactly this by looking at patterns. The Java implementation is available here.

I applied this algorithm to the English Wikipedia, you can see the results here. I also applied it to a collection of Portuguese new articles, results are here.




回答2:


Wordnet contains acronym for tons of words which you can use in variety of programming languages: http://wordnet.princeton.edu/wordnet/

Or get from Freebase. See this: What is one way to find related names using the web?



来源:https://stackoverflow.com/questions/26716622/how-to-automatically-detect-acronym-meaning-extension

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!