instant searching in petabyte of data

前端 未结 3 1221
心在旅途
心在旅途 2021-01-01 07:56

I need to search over petabyte of data in CSV formate files. After indexing using LUCENE, the size of the indexing file is doubler than the original file. Is it possible to

3条回答
  •  Happy的楠姐
    2021-01-01 08:33

    Any decent off the shelf search engine (like Lucene) should be able to provide search functionality over the size of data you have. You may have to do a bit of work up front to design the indexes and configure how the search works, but this is just config.

    You won't get instant results but you might be able to get very quick results. The speed will probably depend on how you set it up and what kind of hardware you run on.

    You mention that the indexes are larger than the original data. This is to be expected. Indexing usually includes some form of denormalisation. The size of the indexes is often a trade off with speed; the more ways you slice and dice the data in advance, the quicker it is to find references.

    Lastly you mention distributing the indexes, this is almost certainly not something you want to do. The practicalities of distributing many petabytes of data are pretty daunting. What you probably want is to have the indexes sat on a big fat computer somewhere and provide search services on the data (bring the query to the data, don't take the data to the query).

提交回复
热议问题