How to count the frequency of words existing in a text using nltk

混江龙づ霸主 提交于 2020-04-17 20:48:07


I have a python script that reads the text and applies preprocess functions in order to do the analysis.
The problem is that I want to count the frequency of words but the system crash and displays the below error.

File "F:\AIenv\textAnalysis\", line 208, in tag_and_save file.write(word+"/"+tag+" (frequency="+str(freq_tagged_data[word])+")\n") TypeError: tuple indices must be integers or slices, not str

I am trying to count the frequency and then write to a text file.

def get_freq(tagged):
    freqs = FreqDist(tagged)
    for word, freq in freqs.items():
        print(word, freq)
    result = word,freq
    return result

def tag_and_save(tagger,text,path):
    clt = clean_text(text)
    tagged_data = tagger.tag(clt)

    freq_tagged_data = get_freq(tagged_data)
    file = open(path,"w",encoding = "UTF8")
    for word,tag in tagged_data:
        file.write(word+"/"+tag+" (frequency="+str(freq_tagged_data[word])+")\n")

I expect the output like this :

('*****/DTNN') 3

based on the answer of

i changed the function get_freq() into :

def get_freq(tagged):
    freq_dist = {}
    freqs = FreqDist(tagged)
    freq_dist = [(word, freq) for word ,freq in freqs.items()]
    return freq_dist

but now it display the below error :

File "F:\AIenv\textAnalysis\", line 217, in tag_and_save file.write(word+"/"+tag+" (frequency="+str(freq_tagged_data[word])+")\n")

TypeError: list

indices must be integers or slices, not str

How to fix this error and what should I do?


Maybe this might help.

import nltk
text = "An an valley indeed so no wonder future nature vanity. Debating all she mistaken indulged believed provided declared. He many kept on draw lain song as same. Whether at dearest certain spirits is entered in to. Rich fine bred real use too many good. She compliment unaffected expression favourable any. Unknown chiefly showing to conduct no."
tokens = [t for t in text.split()]
freqs = nltk.FreqDist(tokens)
blah_list = [(k, v) for k, v in freqs.items()]

This snippet counts the word frequency.

Edit: Code is now working.

