问题
I'm attempting to query a table which contains a character varying[]
column of years, and return those years as a string of comma-delimited year ranges. The year ranges would be determined by sequential years present within the array, and years/year ranges which are not sequential should be separated be commas.
The reason the data-type is character varying[]
rather than integer[]
is because a few of the values contain ALL
instead of a list of years. We can omit these results.
So far I've had little luck approaching the problem as I'm not really even sure where to start.
Would someone be able to give me some guidance or provide a useful examples of how one might solve such as challenge?
years_table
Example
+=========+============================+
| id | years |
| integer | character varying[] |
+=========+============================+
| 1 | {ALL} |
| 2 | {1999,2000,2010,2011,2012} |
| 3 | {1990,1991,2007} |
+---------+----------------------------+
Output Goal:
Example SQL Query:
SELECT id, [year concat logic] AS year_ranges
FROM years_table WHERE 'ALL' NOT IN years
Result:
+====+======================+
| id | year_ranges |
+====+======================+
| 2 | 1999-2000, 2010-2012 |
| 3 | 1990-1991, 2007 |
+----+----------------------+
回答1:
SELECT id, string_agg(year_range, ', ') AS year_ranges
FROM (
SELECT id, CASE WHEN count(*) > 1
THEN min(year)::text || '-' || max(year)::text
ELSE min(year)::text
END AS year_range
FROM (
SELECT *, row_number() OVER (ORDER BY id, year) - year AS grp
FROM (
SELECT id, unnest(years) AS year
FROM (VALUES (2::int, '{1999,2000,2010,2011,2012}'::int[])
,(3, '{1990,1991,2007}')
) AS tbl(id, years)
) sub1
) sub2
GROUP BY id, grp
ORDER BY id, min(year)
) sub3
GROUP BY id
ORDER BY id
Produces exactly the desired result.
If you deal with an an array of varchar (varchar[]
, just cast it to int[]
, before you proceed. It seems to be in perfectly legal form for that:
years::int[]
Replace the inner sub-select with the name of your source table in productive code.
FROM (VALUES (2::int, '{1999,2000,2010,2011,2012}'::int[])
,(3, '{1990,1991,2007}')
) AS tbl(id, years)
->
FROM tbl
Since we are dealing with a naturally ascending number (the year) we can use a shortcut to form groups of consecutive years (forming a range). I subtract the year itself from row number (ordered by year). For consecutive years, both row number and year increment by one and produce the same grp
number. Else, a new range starts.
More on window functions in the manual here and here.
A plpgsql function might be even faster in this case. You'd have to test. Examples in these related answers:
Ordered count of consecutive repeats / duplicates
ROW_NUMBER() shows unexpected values
回答2:
SQL Fiddle Not the output format you asked for but I think it can be more useful:
select id, g, min(year), max(year)
from (
select id, year,
count(not g or null) over(partition by id order by year) as g
from (
select id, year,
lag(year, 1, 0) over(partition by id order by year) = year - 1 as g
from (
select id, unnest(years)::integer as year
from years
where years != '{ALL}'
) s
) s
) s
group by 1, 2
来源:https://stackoverflow.com/questions/17533040/return-array-of-years-as-year-ranges