How much size I will save if changed INT column to MEDIUMINT?

别来无恙 提交于 2019-12-08 15:42:52

Just use INT unless you have a specific, measurable problem. You're only going to make a mess of things if you fret over every single byte in an era where even the most thrifty of smart phones has a billion of them for memory alone.

I need the database size to be as small as possible to fit in RAM to reduce the hard-desk requests.

No you don't. You need the database to be easy to work with and perform adequately. In an era of SSD-backed databases, I/O will not be a problem until you're operating at large scale, and when and if that day comes then you can take measurements and understand the specific problems you're having.

Shaving a single byte off your INT field is unlikely to make anything better since three byte integer values are not something your CPU can directly deal with. These will be converted to four bytes and aligned properly so they can be understood, a process that's messy compared to reading a plain old 32-bit integer.

Remember, MySQL comes from an era where a high-end server had 64 megabytes of memory and a 9 gigabyte hard disk was considered huge. Back then you did have to shave bytes off because you only had a handful of them.

Now we have other concerns, like will you accidentally exhaust your 24-bit integer space like Slashdot did where their site went down because of exactly the sort of "optimizing" you're intending to do here.

Be careful. Optimize when you have a concrete reason to, not just because you think you need to. Avoiding premature optimization is a constant struggle in development, but if you're disciplined you can avoid it.

The exact size of your index is going to depend on how many rows you have, but also on how the data in your index looks.

If you shave off 1 byte per record in your data, and you have 10.000.000 records, that'll only save you up to 10MB on disk for the table data. Adding an index is going to add some more, and B-trees have empty space in them, but it depends on the actual data how inefficient it is.

If you want to save space, make sure that the field is not nullable, because even if you fill all rows with data, there is information per record, stating whether the nullable field contains data or not.

(I disagree with some of the other Answers/Comments. I will try to answer all the questions, plus address all the points that I disagree with.)

MEDIUMINT is 3 bytes, saving 1 byte per row over INT.
TINYINT is 1 bytes, saving 3 bytes per row over INT.
In both cases, there is another 1 or 3 bytes saved per occurrence in any INDEX other than the PRIMARY KEY.

If you are likely to have more data+index than space in RAM, then it is wise to shrink the datatypes but be conservative.

Use MEDIUMINT UNSIGNED (etc) if the value is non-negative, such as for AUTO_INCREMENT. That gives you a limit of 16M instead of 8M. (Yeah, yeah, that's a tiny improvement.)

Beware of "burning" AUTO_INCREMENT ids -- INSERT IGNORE (and several other commands) will allocate the next auto_inc before checking whether it will be used.

Even if data+index exceeds RAM size (actually innodb_buffer_pool_size), it may not slow down to disk speed -- it depends on access patterns of the data. Beware of UUIDs, they are terribly random. Using UUIDs when you can't cache the entire index is deadly. The buffer_pool is a cache. (I have seen a 1TB dataset run fast enough with only 32GB of RAM and a spinning disk.)

Using ALTER TABLE to change a datatype probably (I am not sure) rebuilds the table, thereby performing the equivalent of OPTIMIZE TABLE.

If the table was created with innodb_file_per_table = OFF and you turn it ON before doing the ALTER, you get a separate file for the table, but ibdata1 will not shrink (instead it will have lots more free space).

Alignment of 3-byte numbers -- not an issue. Powers of 2 is not relevant here. MySQL assumes all columns are at poor boundaries, and of poor sizes. All numbers are converted to a generic format (64-bit numbers) for operating on. This conversion is an insignificant part of the total time -- fetching the row (even if cached) is the most costly part.

When I/O-bound, shrinking datatypes leads to more rows per block, which leads to fewer disk hits (except in the UUID case). When I/O-bound, hitting the disk is overwhelming the biggest performance cost.

"NULLS take no space" -- https://dev.mysql.com/doc/internals/en/innodb-field-contents.html . So, again, less I/O. But, beware, if this leads to an extra check for NULL in a SELECT, that could lead to a table scan instead of using an index. Hitting 10M rows is a lot worse than hitting just a few.

As for how many clients you can fit into 32GB -- Maybe 6 or more. Remember, the buffer_pool is a cache; data and indexes are cached on a block-by-block basis. (An InnoDB block is 16KB.)

One more thing... It is a lot easier to shrink the datatypes before going into production. So, do what you can safely do now.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!