Selecting random rows from a big table in H2 database

吃可爱长大的小学妹 提交于 2019-12-06 11:24:28

The following script selects every nth row quite efficiently. It assumes there are no gaps in the ids. If gaps are possible, then you might want to increase the range(1, 100) to range(1, 200) or so. To get random rows, the formula at the very end would need to be changed a bit:

drop table test;

create table test(
  id bigint auto_increment primary key, 
  name varchar(255));

insert into test 
select x, 'Hello ' || x from system_range(50, 1200);

select * from test t, system_range(1, 100) range
where t.id = x * (select max(id)-min(id) from test) / 100 + 
(select min(id) from test);

You should use column id instead of rowid. Column id exists in your table and is auto_increment.

You can Rank your table and select Random 50 ranks out of it, avoid sorting or grouping in any way to keep it optimized.

What I do for this is create a temp table. Generate random numbers from 1 to greatest identity value in the table. Then select from the table where their identity value is in the temp table.

"Single query way of doing this"

Create temp table (I don't know the h2 syntax for this but it supports temp tables with a field name of DesiredIdentity)

Select max identity value from the table.

Loop through with the rand command to insert random numbers into a temp table from 1 to the number of random rows you want. Set the range for random from 1 to max row count. Insure the same random number is not selected.

Then select from the table where the identity value is in the identity temp table.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!