Return array of years as year ranges

问题

I'm attempting to query a table which contains a character varying[] column of years, and return those years as a string of comma-delimited year ranges. The year ranges would be determined by sequential years present within the array, and years/year ranges which are not sequential should be separated be commas.

The reason the data-type is character varying[] rather than integer[] is because a few of the values contain ALL instead of a list of years. We can omit these results.

So far I've had little luck approaching the problem as I'm not really even sure where to start.

Would someone be able to give me some guidance or provide a useful examples of how one might solve such as challenge?

years_table Example

+=========+============================+
| id      | years                      |
| integer | character varying[]        |
+=========+============================+
| 1       | {ALL}                      |
| 2       | {1999,2000,2010,2011,2012} |
| 3       | {1990,1991,2007}           |
+---------+----------------------------+

Output Goal:

Example SQL Query:

SELECT id, [year concat logic] AS year_ranges
FROM years_table WHERE 'ALL' NOT IN years

Result:

+====+======================+
| id | year_ranges          |
+====+======================+
| 2  | 1999-2000, 2010-2012 |
| 3  | 1990-1991, 2007      |
+----+----------------------+

回答1:

SELECT id, string_agg(year_range, ', ') AS year_ranges
FROM (
   SELECT id, CASE WHEN count(*) > 1
               THEN min(year)::text || '-' ||  max(year)::text 
               ELSE min(year)::text
              END AS year_range
   FROM  (
      SELECT *, row_number() OVER (ORDER BY id, year) - year AS grp
      FROM  (
         SELECT id, unnest(years) AS year
         FROM  (VALUES (2::int, '{1999,2000,2010,2011,2012}'::int[])
                      ,(3,      '{1990,1991,2007}')
               ) AS tbl(id, years)
         ) sub1
      ) sub2
   GROUP  BY id, grp
   ORDER  BY id, min(year)
   ) sub3
GROUP  BY id
ORDER  BY id

Produces exactly the desired result.

If you deal with an an array of varchar (varchar[], just cast it to int[], before you proceed. It seems to be in perfectly legal form for that:

years::int[]

Replace the inner sub-select with the name of your source table in productive code.

 FROM  (VALUES (2::int, '{1999,2000,2010,2011,2012}'::int[])
              ,(3,      '{1990,1991,2007}')
       ) AS tbl(id, years)

FROM  tbl

Since we are dealing with a naturally ascending number (the year) we can use a shortcut to form groups of consecutive years (forming a range). I subtract the year itself from row number (ordered by year). For consecutive years, both row number and year increment by one and produce the same grp number. Else, a new range starts.

More on window functions in the manual here and here.

A plpgsql function might be even faster in this case. You'd have to test. Examples in these related answers:
Ordered count of consecutive repeats / duplicates
ROW_NUMBER() shows unexpected values

回答2:

SQL Fiddle Not the output format you asked for but I think it can be more useful:

select id, g, min(year), max(year)
from (
    select id, year,
        count(not g or null) over(partition by id order by year) as g
    from (
        select id, year,
            lag(year, 1, 0) over(partition by id order by year) = year - 1 as g
        from (
            select id, unnest(years)::integer as year
            from years
            where years != '{ALL}'
        ) s
    ) s
) s
group by 1, 2

来源：https://stackoverflow.com/questions/17533040/return-array-of-years-as-year-ranges

标签

sql

arrays

postgresql

postgresql-9.2

window-functions