What does it mean to have multiple sortkey columns?

后端 未结 3 513
温柔的废话
温柔的废话 2020-12-24 02:58

Redshift allows designating multiple columns as SORTKEY columns, but most of the best-practices documentation is written as if there were only a single SORTKEY.

3条回答
  •  心在旅途
    2020-12-24 03:50

    If you declare SORTKEY(COL1, COL2), all columns will be sorted by COL1, then COL2 as if ORDER BY (COL1, COL2) was done.

    If you are using SORTKEY to speed up a JOIN, AFAIU it doesn't matter so long as you use the same SORTKEY on the tables that will be joined because what happens is a merge join.

    If COL1 is highly selective like your type_id, it means there are only small numbers of rows which has the same type_id. Therefore although you can add another column to SORTKEY, its utility is limited because most of the row elimination has already happened.

    If COL1 is not highly selective like your stamp (which is a bit weird btw; I would have expected it to be more selective than type_id? Anyways..), it means that filtering by stamp won't eliminate that much rows. So it makes more sense to declare a second sort key. However, this is less efficient than the other way around as eliminating rows earlier would be cheaper. If you sometimes filter by stamp but not by type_id, it may make sense to do this though.

提交回复
热议问题