How to reduce the size of a generated Lucene/Solr index?

后端 未结 1 1623
天命终不由人
天命终不由人 2021-01-14 11:23

I am working on a prototype of a search system.

I have a table in oracle with some fields. I generated data that looks real. Around 300.000 rows. For example:

相关标签:
1条回答
  • 2021-01-14 12:26

    You can use all the insights provided here. Some additional points I wanted to share.

    Solr does duplication of the data for providing the fast search over indexed data. One important thing about solr is, it uses immutable data structure for storing all the data.

    • Term Dictionary : Dictionary of indexed terms along with their frequency and offset to posting lists.
    • Term Vectors: Solr stores the term vector for each document indexed. This is essentially a separate inverted index for each document. This is usually storage heavy.
    • Stored Docs : stores each document with their fields in sequential order.
    • Doc values : stores fields for all the document together. This is similar to columnar storage of data.

    You can disable the document level Term Vectors storage if you are not using solr highlighting feature of the solr.

    Additionally, Solr uses many different compression techniques for different type of data. It uses bit packing/vint compression for posting lists and numerical values. LZ4 compression for stored fields and term vectors. It uses FST data structure for storing the Term Dictionary. FST is an special implementation of Trie data structure.

    0 讨论(0)
提交回复
热议问题