Expected tensorflow model size from learned variables

流过昼夜 提交于 2019-12-05 11:56:57

Adding up all those variables we would expect to get a model.ckpt.data file of size 12.45Mb

Traditionally, most of model parameters are in the first fully connected layer, in this case wd1. Computing only its size yields:

7*7*128 * 1024 * 4 = 25690112

... or 25.6Mb. Note 4 coefficient, because the variable dtype=tf.float32, i.e. 4 bytes per parameter. Other layers also contribute to the model size, but not so drastically.

As you can see, your estimate 12.45Mb is a bit off (did you use 16bit per param?). The checkpoint also stores some general information, hence the overhead around 25%, which is still big, but not 300%.

[Update]

The model in question actually has FC1 layer of shape [7*7*64, 1024], as was clarified. So the calculated above size should be 12.5Mb, indeed. That made me look into the saved checkpoint more carefully.

After inspecting it, I noticed other big variables that I missed originally:

...
Variable_2 (DT_FLOAT) [3136,1024]
Variable_2/Adam (DT_FLOAT) [3136,1024]
Variable_2/Adam_1 (DT_FLOAT) [3136,1024]
...

The Variable_2 is exactly wd1, but there are 2 more copies for the Adam optimizer. These variables are created by the Adam optimizer, they're called slots and hold the m and v accumulators for all trainable variables. Now the total size makes sense.

You can run the following code to compute the total size of the graph variables - 37.47Mb:

var_sizes = [np.product(list(map(int, v.shape))) * v.dtype.size
             for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)]
print(sum(var_sizes) / (1024 ** 2), 'MB')

So the overhead is actually pretty small. Extra size is due to the optimizer.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!