Getting total term frequency throughout entire index (Elasticsearch)

假装没事ソ 提交于 2019-12-03 02:09:29

The reason for the difference in the count is because term vectors are not accurate unless the index in question has a single shard. For indexes with multiple shards, the documents are distributed all over the shards, hence the frequency returned isn't the total but from a randomly selected shard.

Thus, the returned frequency is just a relative measure and not the absolute value you expect. see the Behaviour section. To test this, you can create a single shard index and request the frequency (it should give you the actual total).

I believe you need to turn term_statistics to true as per elasticsearch documentation:

Term statistics Setting term_statistics to true (default is false) will return

total term frequency (how often a term occurs in all documents)

document frequency (the number of documents containing the current term)

By default these values are not returned since term statistics can have a serious performance impact.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!