I like thinking about how everything can be and is represented by numbers. For example, plaintext is represented by a code like ASCII, and images are represented by RGB valu
The simplest way to represent sound as numbers is PCM (Pulse Code Modulation). This means that the amplitude of the sound is recorded at a set frequency (each amplitude value is called a sample). CD quality sound for example is 16 bit samples (in stereo) at the frequency 44100 Hz.
A sample can be represented as an integer number (usually 8, 12, 16, 24 or 32 bits) or a floating point number (16 bit float or 32 bit double). The number can either be signed or unsigned.
For 16 bit signed samples the value 0 would be in the middle, and -32768 and 32767 would be the maximum amplitues. For 16 bit unsigned samples the value 32768 would be in the middle, and 0 and 65535 would be the maximum amplitudes.
For floating point samples the usual format is that 0 is in the middle, and -1.0 and 1.0 are the maximum amplitudes.
The PCM data can then be compressed, for example using MP3.