问题
Executed: alter table device_msg convert to character set 'utf8' COLLATE 'utf8_unicode_ci';"
As my expect,table data size change to smaller.
But at the same time, table index size change to bigger ?
What happen and why ?
ps: table data size and index size are calculated by information_schema.TABLES
DbEngine: InnoDB
Table Before:
CREATE TABLE `device_msg` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`sn` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
`time` datetime(3) NOT NULL,
`msg` json NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `device_UNIQUE` (`sn`,`time`)
) ENGINE=InnoDB AUTO_INCREMENT=62077733 DEFAULT CHARSET=utf8mb4;
Table After:
CREATE TABLE `device_msg` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`sn` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
`time` datetime(3) NOT NULL,
`msg` json NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `device_UNIQUE` (`sn`,`time`)
) ENGINE=InnoDB AUTO_INCREMENT=62077733 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Before:
totalSize: 2.14 GB
indexSize: 282.98 MB
dataSize: 1.86 GB
avg_row_len: 297B
After
totalSize: 1.93 GB
indexSize: 413.97 MB
dataSize: 1.52 GB
avg_row_len: 260B
If data of information_schema.TABLES is not accurate,
How to make it right ?
回答1:
Just in my opinion
As I read on MySQL document about the limitation.
https://dev.mysql.com/doc/refman/5.6/en/innodb-restrictions.html
By default, the index key prefix length limit is 767 bytes
if the index column exceeds this size, it will be truncated. I assume your indexed column value has 255 characters.
in the case of utf8mb4, 1 character = 4 bytes, the limit is around 191 characters. So 191 characters will be added to index, other (255-191=64) characters will be truncated from the index.
When you change encoding to utf8 (at that time 1 character = 3 bytes), the indexed limit will become around 255 characters. It means your column value, all 255 characters, will be added to index without truncating.
The characters that are added to the index increased from 191 characters to 255 characters, so the index size was also increased.
回答2:
The space taken by utf8mb4, then utf8 (assuming there were no 4-byte characters beforehand) is the same, in spite of the numbers you show.
This
ALTERrequired rebuilding the table and the indexes.InnoDB structures the data and each secondary index in a BTrees.
Depending on the order by which you insert elements into a BTree, more or fewer "block splits" will occur.
So, You can't really say whether it is the character set change or the rebuild that lead to the index getting bigger and the data getting smaller.
I say it was not the charset change.
来源:https://stackoverflow.com/questions/58514911/why-tables-index-storage-size-is-bigger-after-change-charset-from-utf8mb4-to-ut