MySQL group_concat(utf8) in union truncated to 1024/3

一世执手 提交于 2020-02-02 06:08:34

问题


TLDR: group_concat(utf8 varchar) union itself returns only group_concat_max_len/3 ASCII characters as if the character length was fixed instead of variable. The group_concat alone returns group_concat_max_len characters as expected.

The problem

I have a table tabletest with a column data defined as an UTF8 varchar(2048). There is only a single row in the table with 1050 ASCII characters in the column.

A group_concat over this table/column returns 1024 characters (equals group_concat_max_len), which is expected.

But an union of this group_concat with the same group_concat returns 341 characters (equals group_concat_max_len / 3).

Why does this happen?

According to MySQL Aggregate (GROUP BY) Function Descriptions:

The result type is TEXT or BLOB unless group_concat_max_len is less than or equal to 512, in which case the result type is VARCHAR or VARBINARY.

And from MySQL The BLOB and TEXT Types:

Similarly, you can regard a TEXT column as a VARCHAR column. BLOB and TEXT differ from VARBINARY and VARCHAR in the following ways:

  • For indexes on BLOB and TEXT columns, you must specify an index prefix length. For CHAR and VARCHAR, a prefix length is optional. See Section 8.3.4, “Column Indexes”.
  • BLOB and TEXT columns cannot have DEFAULT values.

So the return type should be TEXT which is also variable-length and should support ASCII characters in utf8 as 1-byte wide.

Relevant answers acknowledging the problem

MySQL Truncating of result when using Group_Concat and Concat

Weird result for GROUP_CONCAT on subquery

Possible culprit

Again from MySQL The BLOB and TEXT Types:

Only the first max_sort_length bytes of the column are used when sorting. The default value of max_sort_length is 1024

AFAIK UNION needs to sort rows before eliminating duplicates, so this seems to be a possible reason. But changing it by set max_sort_length=2048; did not change the returned character count.

The only workaround seems to be SET group_concat_max_len = 1539; or more. Just 1538 or less returns only 512 or less characters. Why this strange number?

Complete example

create database uniontest collate utf8_general_ci;
create table uniontest.tabletest (data varchar(2048));
insert into uniontest.tabletest select repeat('a',1050);

Simple select of the length of the 1050 characters:

select length(data) from uniontest.tabletest;

Outputs:

+--------------+
| length(data) |
+--------------+
|         1050 |
+--------------+

Group concat of the single line of 1050 characters (so no separators are added). In server configuration group_concat_max_len=1024

select length(group_concat(data separator ',')) from uniontest.tabletest;

Output is truncated as expected:

+------------------------------------------+
| length(group_concat(data separator ',')) |
+------------------------------------------+
|                                     1024 |
+------------------------------------------+

Now union with itself (in attempt to prevent additional datatype conversions):

select length(data) from (
select group_concat(data separator ',') as data from uniontest.tabletest union 
select group_concat(data separator ',') as data from uniontest.tabletest) d;

Unexpected result (expecting 1024):

+--------------+
| length(data) |
+--------------+
|          341 |
+--------------+

Tested on MySQL 5.6 and 5.7.

EDIT

Found a bug report about ORDER BY instead of UNION ORDER BY truncates GROUP_CONCAT result. It is reported Closed, maybe only the ORDER BY case was fixed?

来源:https://stackoverflow.com/questions/47733920/mysql-group-concatutf8-in-union-truncated-to-1024-3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!