remove duplicates from comma separated string (Amazon Redshift)

南楼画角 提交于 2019-12-11 00:44:18

问题


I am using Amazon Redshift.

I have a column in that string is stored as comma separated like Private, Private, Private, Private, Private, Private, United Healthcare. I want to remove the duplicates from it using query, so the result should be Private, United Healthcare. I found some solutions obviously from Stackoverflow and came to know it is possible using regular expressions.

Hence, I have tried using:

SELECT  regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\1') AS insurances; 

And

SELECT  regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\g') AS insurances; 

And also some other regular expressions but seems not working. Any solution?


回答1:


Try this way,

SELECT  array_agg(DISTINCT insurances) 
FROM (SELECT  regexp_split_to_table('Private, Private, Private, Private, Private, Private, United Healthcare'
              , ',\s+') AS insurances) x;

Alternative way

SELECT DISTINCT UNNEST(regexp_split_to_array('Private, Private, Private, Private, Private, Private, United Healthcare', ',\s+')) AS insurances;

Checking http://docs.aws.amazon.com/redshift/latest/dg/String_functions_header.html both will fail with redshift, none of those converts text to text[]




回答2:


Alternative Option is to try Python UDF. Simple Python function dedupes the string and return correct version.




回答3:


Here is a User-Defined Function (UDF) for Amazon Redshift:

CREATE FUNCTION f_uniquify (s text)
  RETURNS text
IMMUTABLE
AS $$
  -- Split string by comma-space, remove duplicates, convert back to comma-separated
  return ', '.join(set(s.split(', ')))
$$ LANGUAGE plpythonu;

Testing it with:

select f_uniquify('Private, Private, Private, Private, Private, Private, United Healthcare');

Returns:

United Healthcare, Private

If the order of return values is important, then it would need some more specific code.



来源:https://stackoverflow.com/questions/39909615/remove-duplicates-from-comma-separated-string-amazon-redshift

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!