Algorithms to detect phrases and keywords from text

前端 未结 5 2182
不思量自难忘°
不思量自难忘° 2020-12-12 09:33

I have around 100 megabytes of text, without any markup, divided to approximately 10,000 entries. I would like to automatically generate a \'tag\' list. The problem is that

5条回答
  •  猫巷女王i
    2020-12-12 09:43

    Do a matrix for words. Then if there are two consecutive words then add one to that appropriate cell.

    For example you have this sentence.
    
    mat['for']['example'] ++;
    mat['example']['you'] ++;
    mat['you']['have'] ++;
    mat['have']['this'] ++;
    mat['this']['sentence'] ++;
    

    This will give you values for two consecutive words. You can do this word three words also. Beware this requires O(n^3) memory.

    You can also use a heap for storing the data like:

    heap['for example']++;
    heap['example you']++;
    

提交回复
热议问题