Estimate Dictionary size using Zipf’s Law

倾然丶 夕夏残阳落幕 提交于 2021-02-05 12:20:47

问题


How would one go about Calculating the Dictionary Size(no.of unique words) of a collection using Zipfs Law?


回答1:


You will have to tokenize your collection, e.g. by white-space and punctuation. Then you store all the tokens in a hash and count. What you do is then plot the distribution of the counts using a tool like Gnuplot.



来源:https://stackoverflow.com/questions/47543798/estimate-dictionary-size-using-zipf-s-law

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!