Which NLP toolkit to use in JAVA? [closed]

孤街浪徒 提交于 2019-11-29 10:34:21

i would recommend you use a combination of POS tagging and then string tokenizing to extract all the nouns out of each abstract.. then use some sort of dictionary/hash to count the frequency of each of these nouns and then outputting the N most prolific nouns.. combining that with some other intelligent filtering mechanisms should do reasonably well in giving you the important keywords from the abstract
for POS tagging check out the POS tagger at http://nlp.stanford.edu/software/index.shtml

However, if you are expecting a lot of multi-word terms in your corpus.. instead of extracting just nouns, you could take the most prolific n-grams for n=2 to 4

There's an Apache project for that... I haven't used it but, OpenNLP an open source Apache project. It's in the incubator so it maybe a bit raw.

This post from jeff's search engine cafe has a number of other suggestions.

fjen

This might be relevant as well: https://github.com/jdf/cue.language

It has stop words, word and ngram frequencies, ...

It's part of the software behind Wordle.

I ended up using the Alias`i Ling Pipe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!