Constructing Zipf Distribution with matplotlib, FITTED-LINE

本小妞迷上赌 提交于 2019-12-08 18:58:27

I know it's been a while since this question was asked. However, I came across a possible solution for this problem at scipy site.
I thought I would post here in case anyone else required.

I didn't have paragraph info, so here is a whipped up dict called frequency that has paragraph occurrence as its values.

We then get its values and convert to numpy array. Define zipf distribution parameter which has to be >1.

Finally display the histogram of the samples,along with the probability density function

Working Code:

import random
import matplotlib.pyplot as plt
from scipy import special
import numpy as np

#Generate sample dict with random value to simulate paragraph data
frequency = {}
for i,j in enumerate(range(50)):
    frequency[i]=random.randint(1,50)

counts = frequency.values()
tokens = frequency.keys()


#Convert counts of values to numpy array
s = np.array(counts)

#define zipf distribution parameter. Has to be >1
a = 2. 

# Display the histogram of the samples,
#along with the probability density function
count, bins, ignored = plt.hist(s, 50, normed=True)
plt.title("Zipf plot for Combined Article Paragraphs")
x = np.arange(1., 50.)
plt.xlabel("Frequency Rank of Token")
y = x**(-a) / special.zetac(a)
plt.ylabel("Absolute Frequency of Token")
plt.plot(x, y/max(y), linewidth=2, color='r')
plt.show()

Plot

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!