If I'm correct, you're asking why the 4096x1x1
layer is much smaller.
That's because it's a fully connected layer. Every neuron from the last max-pooling layer (=256*13*13=43264
neurons) is connectd to every neuron of the fully-connected layer.
This is an example of an ALL to ALL connected neural network:
As you can see, layer2 is bigger than layer3. That doesn't mean they can't connect.
There is no conversion of the last max-pooling layer -> all the neurons in the max-pooling layer are just connected with all the 4096 neurons in the next layer.
The 'dense' operation just means calculate the weights and biases of all these connections (= 4096 * 43264 connections) and add the bias of the neurons to calculate the next output.
It's connected the same was an MLP.
But why 4096? There is no reasoning. It's just a choice. It could have been 8000, it could have been 20, it just depends on what works best for the network.