Why table's index storage size is bigger after change charset from utf8mb4 to utf8?

房东的猫 提交于 2020-01-03 04:50:10

问题


Executed: alter table device_msg convert to character set 'utf8' COLLATE 'utf8_unicode_ci';"

As my expect,table data size change to smaller.

But at the same time, table index size change to bigger ?

What happen and why ?

ps: table data size and index size are calculated by information_schema.TABLES


DbEngine: InnoDB

Table Before:

CREATE TABLE `device_msg` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `sn` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
  `time` datetime(3) NOT NULL,
  `msg` json NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `device_UNIQUE` (`sn`,`time`)
) ENGINE=InnoDB AUTO_INCREMENT=62077733 DEFAULT CHARSET=utf8mb4;

Table After:

CREATE TABLE `device_msg` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `sn` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
  `time` datetime(3) NOT NULL,
  `msg` json NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `device_UNIQUE` (`sn`,`time`)
) ENGINE=InnoDB AUTO_INCREMENT=62077733 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;


Before:

totalSize: 2.14 GB
indexSize: 282.98 MB
dataSize: 1.86 GB
avg_row_len:  297B

After

totalSize: 1.93 GB
indexSize: 413.97 MB
dataSize: 1.52 GB
avg_row_len:  260B

If data of information_schema.TABLES is not accurate,

How to make it right ?


回答1:


Just in my opinion

As I read on MySQL document about the limitation.

https://dev.mysql.com/doc/refman/5.6/en/innodb-restrictions.html

By default, the index key prefix length limit is 767 bytes

if the index column exceeds this size, it will be truncated. I assume your indexed column value has 255 characters.

in the case of utf8mb4, 1 character = 4 bytes, the limit is around 191 characters. So 191 characters will be added to index, other (255-191=64) characters will be truncated from the index.

When you change encoding to utf8 (at that time 1 character = 3 bytes), the indexed limit will become around 255 characters. It means your column value, all 255 characters, will be added to index without truncating.

The characters that are added to the index increased from 191 characters to 255 characters, so the index size was also increased.




回答2:


  • The space taken by utf8mb4, then utf8 (assuming there were no 4-byte characters beforehand) is the same, in spite of the numbers you show.

  • This ALTER required rebuilding the table and the indexes.

  • InnoDB structures the data and each secondary index in a BTrees.

  • Depending on the order by which you insert elements into a BTree, more or fewer "block splits" will occur.

So, You can't really say whether it is the character set change or the rebuild that lead to the index getting bigger and the data getting smaller.

I say it was not the charset change.



来源:https://stackoverflow.com/questions/58514911/why-tables-index-storage-size-is-bigger-after-change-charset-from-utf8mb4-to-ut

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!