Postgresql - IN clause optimization for more than 3000 values

江枫思渺然 提交于 2021-02-17 06:04:28

问题


I have an application where the user will be uploading an excel file(.xlsx or .csv) with more than 10,000 rows with a single column "partId" containing the values to look for in database

I will be reading the excel values and store it in list object and pass the list as parameter to the Spring Boot JPA repository find method that builds IN clause query internally:

                // Read excel file
                stream = new   ByteArrayInputStream(file.getBytes());
                wb = WorkbookFactory.create(stream);
                org.apache.poi.ss.usermodel.Sheet sheet = wb.getSheetAt(wb.getActiveSheetIndex());
                
                Iterator<Row> rowIterator = sheet.rowIterator();
                
                while(rowIterator.hasNext()) {
                    Row row = rowIterator.next();
                    Cell cell = row.getCell(0);
                    System.out.println(cell.getStringCellValue());
                    vinList.add(cell.getStringCellValue());
                }
                
                //JPA repository method that I used
                  findByPartIdInAndSecondaryId(List<String> partIds);

I read in many articles and experienced the same in above case that using IN query is inefficient for huge list of data.

How can I optimize the above scenario or write a new optimized query?

Also, please let me know if there is optimized way of reading an excel file than the above mentioned code snippet

It would be much helpful!! Thanks in advance!


回答1:


If the list is truly huge, you will never be lightning fast.

I see several options:

  1. Send a query with a large IN list, as you mention in your question.

  2. Construct a statement that is a join with a large VALUES clause:

    SELECT ... FROM mytable
       JOIN (VALUES (42), (101), (43), ...) AS tmp(col)
          ON mytable.id = tmp.col;
    
  3. Create a temporary table with the values and join with that:

    BEGIN;
    CREATE TEMP TABLE tmp(col bigint) ON COMMIT DROP;
    

    Then either

    COPY tmp FROM STDIN; -- if Spring supports COPY
    

    or

    INSERT INTO tmp VALUES (42), (101), (43), ...; -- if not
    

    Then

    ANALYZE tmp;  -- for good statistics
    SELECT ... FROM mytable
       JOIN tmp ON mytable.id = tmp.col;
    COMMIT;  -- drops the temporary table
    

Which of these is fastest is best determined by trial and error for your case; I don't think that it can be said that one of the methods will always beat the others.

Some considerations:

  • Solutions 1. and 2. may result in very large statements, while solution 3. can be split in smaller chunks.

  • Solution 3. will very likely be slower unless the list is truly large.



来源:https://stackoverflow.com/questions/64785993/postgresql-in-clause-optimization-for-more-than-3000-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!