发表新帖

发表新帖

How to reduce the size of a generated Lucene/Solr index?

后端未结

关注

 1  1639

天命终不由人

I am working on a prototype of a search system.

I have a table in oracle with some fields. I generated data that looks real. Around 300.000 rows. For example:

相关标签:

1条回答

温柔的废话

2021-01-14 12:26
You can use all the insights provided here. Some additional points I wanted to share.

Solr does duplication of the data for providing the fast search over indexed data. One important thing about solr is, it uses immutable data structure for storing all the data.
- Term Dictionary : Dictionary of indexed terms along with their frequency and offset to posting lists.
- Term Vectors: Solr stores the term vector for each document indexed. This is essentially a separate inverted index for each document. This is usually storage heavy.
- Stored Docs : stores each document with their fields in sequential order.
- Doc values : stores fields for all the document together. This is similar to columnar storage of data.
You can disable the document level Term Vectors storage if you are not using solr highlighting feature of the solr.

Additionally, Solr uses many different compression techniques for different type of data. It uses bit packing/vint compression for posting lists and numerical values. LZ4 compression for stored fields and term vectors. It uses FST data structure for storing the Term Dictionary. FST is an special implementation of Trie data structure.
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题