I need to search over petabyte of data in CSV formate files. After indexing using LUCENE, the size of the indexing file is doubler than the original file. Is it possible to
If you want to avoid changing your implementation, you should decompose your lucene index into 10, 20 or even more indices and query them in parallel. It worked in my case (I created 8 indices), I had 80 GB of data, and I needed implement search which works on a developer machine (Intel Duo Core, 3GB RAM).