What you need is the impulse response of the room or reverb chamber which you want to model or simulate. The full impulse response will include all the multiple and multi-path echos. The length of the impulse response will be roughly equal to the length of time (in samples) it takes for an impulse sound to completely decay below audible threshold or given noise floor.
Given an impulse vector of length N, you could produce an audio output sample by vector multiplication of the input vector (made up of the current audio input sample concatenated with the previous N-1 input samples) by the impulse vector, with appropriate scaling.
Some people simplify this by assuming most taps (down to all but 1) in the impulse response are zero, and just using a few scaled delay lines for the remaining echos which are then added into the output.
For even more realistic reverb, you might want to use different impulse responses for each ear, and have the response vary a bit with head position. A head movement of as little as a quarter inch might vary the position of peaks in the impulse response by 1 sample (at 44.1k rates).