Max limit on the number of values I can specify in the ids filter or generally query clause?

前端 未结 4 1526
北海茫月
北海茫月 2020-11-29 00:53

In elasticsearch what is the max limit to specify the value in the number of values a match can be performed on? I read somewhere that it is 1024 but is also configurable. I

4条回答
  •  日久生厌
    2020-11-29 01:36

    I don't think there is any limit set by Elaticsearch or Lucene explicitly. The limit you might hit, though, is the one set in place by the JDK.

    To prove my statement above, I looked at the source code of Elasticsearch:

    • when the request comes in there is a parser that parses the array of ids. All it's using is an ArrayList. This is then passed along to Lucene, which in turn it's using a List.

    • this is the Lucene TermsFilter class (line #84) that gets the list of IDS from Elasticsearch within a List.

    • source code of ArrayList class from Oracle JDK 1.7.0_67:

    /**
     * The maximum size of array to allocate.
     * Some VMs reserve some header words in an array.
     * Attempts to allocate larger arrays may result in
     * OutOfMemoryError: Requested array size exceeds VM limit
     */
    private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;   
    
    /**
     * Increases the capacity to ensure that it can hold at least the
     * number of elements specified by the minimum capacity argument.
     *
     * @param minCapacity the desired minimum capacity
     */
    private void grow(int minCapacity) {
        ...
        if (newCapacity - MAX_ARRAY_SIZE > 0)
            newCapacity = hugeCapacity(minCapacity);
        ...
    }
    
    private static int hugeCapacity(int minCapacity) {
        if (minCapacity < 0) // overflow
            throw new OutOfMemoryError();
        return (minCapacity > MAX_ARRAY_SIZE) ?
            Integer.MAX_VALUE :
            MAX_ARRAY_SIZE;
    }
    

    And that number (Integer.MAX_VALUE - 8) is 2147483639. So, this would be the theoretical max size of that array.

    I've tested locally in my ES instance an array of 150000 elements. And here comes the performance implications: of course, you would get a degrading performance the larger the array gets. In my simple test with 150k ids I got a 800 ms execution time. But, all depends on CPU, memory, load, datasize, data mapping etc etc. The best would be for you to actually test this.

    UPDATED Dec. 2016: this answer applies for the Elasticsearch version in existence at the end of 2014, ie in the 1.x branch. The latest available at that time was 1.4.x.

提交回复
热议问题