Why Redshift automatically trims varchar column when joining?

泪湿孤枕 提交于 2019-12-11 06:03:38

问题


I encountered unique problem when using Redshift. Please see the below illustrative example:

drop table if exists joinTrim_temp1;
create table joinTrim_temp1(rowIndex1 int, charToJoin1 varchar(20));
insert into joinTrim_temp1 values(1, 'Sudan' );
insert into joinTrim_temp1 values(2, 'Africa' );
insert into joinTrim_temp1 values(3, 'USA' );

drop table if exists joinTrim_temp2;
create table joinTrim_temp2(rowIndex2 int, charToJoin2 varchar(20));
insert into joinTrim_temp2 values(1, 'Sudan ' );
insert into joinTrim_temp2 values(2, 'Africa ' );
insert into joinTrim_temp2 values(3, 'USA ' );

select * from joinTrim_temp1 a join joinTrim_temp2 b on a.charToJoin1 = b.charToJoin2;

The output of the query is as below:

In the query you can see that there is a trailing space in the second table. So no inner join should take place. But it seems that Redshift is able to trim the trailing whitespaces when joining.

I encountered this problem, while converting the existing Redshift sql code to PySpark.

Regards, Kumar


回答1:


Ah! Indeed, a very interesting find!

From Character Types - Amazon Redshift:

Trailing spaces in VARCHAR and CHAR values are treated as semantically insignificant when values are compared.

It appears that, if you wish to force the comparison, would you need to avoid the trailing space, such as:

SELECT * 
FROM joinTrim_temp1 a 
JOIN joinTrim_temp2 b 
ON a.charToJoin1 || '.' = b.charToJoin2 || '.';


来源:https://stackoverflow.com/questions/53569896/why-redshift-automatically-trims-varchar-column-when-joining

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!