发表新帖

发表新帖

How do I extract keywords used in text? [closed]

前端未结

关注

 6  1667

面向向阳花 2020-12-02 03:56

6条回答

臣服心动 (楼主)

2020-12-02 04:40
The general algorithm is going to go like this:
```
- Obtain Text
- Strip punctuation, special characters, etc.
- Strip "simple" words
- Split on Spaces
- Loop Over Split Text
    - Add word to Array/HashTable/Etc if it doesn't exist;
       if it does, increment counter for that word
```
The end result is a frequency count of all words in the text. You can then take these values and divide by the total number of words to get a percentage of frequency. Any further processing is up to you.

You're also going to want to look into Stemming. Stemming is used to reduce words to their root. For example going => go, cars => car, etc.

An algorithm like this is going to be common in spam filters, keyword indexing and the like.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题