I have two raw sound streams that I need to add together. For the purposes of this question, we can assume they are the same bitrate and bit depth (say 16 bit sample, 44.1k
I found a new way to add samples in a way in which they can never exceed a given range. The basic Idea is to convert values in a range between -1 to 1 to a range between approximately -Infinity to +Infinity, add everything together and reverse the initial transformation. I came up with the following formulas for this:
I tried it out and it does work, but for multiple loud sounds the resulting audio sounds worse than just adding the samples together and clipping every value which is too big. I used the following code to test this:
#include <math.h>
#include <stdio.h>
#include <float.h>
#include <stddef.h>
#include <stdint.h>
#include <string.h>
#include <stdbool.h>
#include <sndfile.h>
// fabs wasn't accurate enough
long double ldabs(long double x){
return x < 0 ? -x : x;
}
// -Inf<input<+Inf, -1<=output<=+1
long double infiniteToFinite( long double sample ){
// if the input value was too big, we'll just map it to -1 or 1
if( isinf(sample) )
return sample < 0 ? -1. : 1.;
long double ret = sample / ( ldabs(sample) + 1 );
// Just in case of calculation errors
if( isnan(ret) )
ret = sample < 0 ? -1. : 1.;
if( ret < -1. )
ret = -1.;
if( ret > 1. )
ret = 1.;
return ret;
}
// -1<=input<=+1, -Inf<output<+Inf
long double finiteToInfinite( long double sample ){
// if out of range, clamp to 1 or -1
if( sample > 1. )
sample = 1.;
if( sample < -1. )
sample = -1.;
long double res = -( sample / ( ldabs(sample) - 1. ) );
// sample was too close to 1 or -1, return largest long double
if( isinf(res) )
return sample < 0 ? -LDBL_MAX : LDBL_MAX;
return res;
}
// -1<input<1, -1<=output<=1 | Try to avoid input values too close to 1 or -1
long double addSamples( size_t count, long double sample[] ){
long double sum = 0;
while( count-- ){
sum += finiteToInfinite( sample[count] );
if( isinf(sum) )
sum = sum < 0 ? -LDBL_MAX : LDBL_MAX;
}
return infiniteToFinite( sum );
}
#define BUFFER_LEN 256
int main( int argc, char* argv[] ){
if( argc < 3 ){
fprintf(stderr,"Usage: %s output.wav input1.wav [input2.wav...]\n",*argv);
return 1;
}
{
SNDFILE *outfile, *infiles[argc-2];
SF_INFO sfinfo;
SF_INFO sfinfo_tmp;
memset( &sfinfo, 0, sizeof(sfinfo) );
for( int i=0; i<argc-2; i++ ){
memset( &sfinfo_tmp, 0, sizeof(sfinfo_tmp) );
if(!( infiles[i] = sf_open( argv[i+2], SFM_READ, &sfinfo_tmp ) )){
fprintf(stderr,"Could not open file: %s\n",argv[i+2]);
puts(sf_strerror(0));
goto cleanup;
}
printf("Sample rate %d, channel count %d\n",sfinfo_tmp.samplerate,sfinfo_tmp.channels);
if( i ){
if( sfinfo_tmp.samplerate != sfinfo.samplerate
|| sfinfo_tmp.channels != sfinfo.channels
){
fprintf(stderr,"Mismatching sample rate or channel count\n");
goto cleanup;
}
}else{
sfinfo = sfinfo_tmp;
}
continue;
cleanup: {
while(i--)
sf_close(infiles[i]);
return 2;
}
}
if(!( outfile = sf_open(argv[1], SFM_WRITE, &sfinfo) )){
fprintf(stderr,"Could not open file: %s\n",argv[1]);
puts(sf_strerror(0));
for( int i=0; i<argc-2; i++ )
sf_close(infiles[i]);
return 3;
}
double inbuffer[argc-2][BUFFER_LEN];
double outbuffer[BUFFER_LEN];
size_t max_read;
do {
max_read = 0;
memset(outbuffer,0,BUFFER_LEN*sizeof(double));
for( int i=0; i<argc-2; i++ ){
memset( inbuffer[i], 0, BUFFER_LEN*sizeof(double) );
size_t read_count = sf_read_double( infiles[i], inbuffer[i], BUFFER_LEN );
if( read_count > max_read )
max_read = read_count;
}
long double insamples[argc-2];
for( size_t j=0; j<max_read; j++ ){
for( int i=0; i<argc-2; i++ )
insamples[i] = inbuffer[i][j];
outbuffer[j] = addSamples( argc-2, insamples );
}
sf_write_double( outfile, outbuffer, max_read );
} while( max_read );
sf_close(outfile);
for( int i=0; i<argc-2; i++ )
sf_close(infiles[i]);
}
return 0;
}
"Quieter by half" isn't quite correct. Because of the ear's logarithmic response, dividing the samples in half will make it 6-db quieter - certainly noticeable, but not disastrous.
You might want to compromise by multiplying by 0.75. That will make it 3-db quieter, but will lessen the chance of overflow and also lessen the distortion when it does happen.
I'd say just add them together. If you're overflowing your 16 bit PCM space, then the sounds you're using are already incredibly loud to begin with and you should attenuate them. If that would cause them to be too soft by themselves, look for another way of increasing the overall volume output, such as an OS setting or turning the knob on your speakers.
convert the samples to floating point values ranging from -1.0 to +1.0, then:
out = (s1 + s2) - (s1 * s2);
Will introduce heavy distortion when |s1 + s2| approach 1.0 (at least when I tried it when mixing simple sine waves). I read this recommendation on several locations, but in my humble opinion, it is a useless approach.
What happens physically when waves 'mix' is that their amplitutes add, just like many of the posters here suggested already. Either
Thank you everyone for sharing your ideas, recently i'm also doing some work related to sound mixing. I'm also have done experimenting thing on this issue, may it help you guys :).
Note that i'm using 8Khz sample rate & 16 bit sample (SInt16) sound in ios RemoteIO AudioUnit.
Along my experiments the best result i found was something different from all this answer, but the basic is the same (As Roddy suggest)
"You should add them together, but clip the result to the allowable range to prevent over/underflow".
But what should be the best way to adding without overflow/underflow ?
Key Idea:: You have two sound wave say A & B, and the resultant wave C will the superposition of two wave A & B. Sample under limited bit range may cause it to overflow. So now we can calculate the maximum limit cross at the upside & minimum limit cross at the downside of the superposition wave form. Now we will subtract maximum upside limit cross to the upper portion of the superposition wave form and add minimum downside limit cross to the lower portion of the superposition wave form. VOILA ... you are done.
Steps:
the following code would show the implementation.
static unsigned long upSideDownValue = 0;
static unsigned long downSideUpValue = 0;
#define SINT16_MIN -32768
#define SINT16_MAX 32767
SInt16* mixTwoVoice (SInt16* RecordedVoiceData, SInt16* RealTimeData, SInt16 *OutputData, unsigned int dataLength){
unsigned long tempDownUpSideValue = 0;
unsigned long tempUpSideDownValue = 0;
//calibrate maker loop
for(unsigned int i=0;i<dataLength ; i++)
{
SInt32 summedValue = RecordedVoiceData[i] + RealTimeData[i];
if(SINT16_MIN < summedValue && summedValue < SINT16_MAX)
{
//the value is within range -- good boy
}
else
{
//nasty calibration needed
unsigned long tempCalibrateValue;
tempCalibrateValue = ABS(summedValue) - SINT16_MIN; // here an optimization comes ;)
if(summedValue < 0)
{
//check the downside -- to calibrate
if(tempDownUpSideValue < tempCalibrateValue)
tempDownUpSideValue = tempCalibrateValue;
}
else
{
//check the upside ---- to calibrate
if(tempUpSideDownValue < tempCalibrateValue)
tempUpSideDownValue = tempCalibrateValue;
}
}
}
//here we need some function which will gradually set the value
downSideUpValue = tempUpSideDownValue;
upSideDownValue = tempUpSideDownValue;
//real mixer loop
for(unsigned int i=0;i<dataLength;i++)
{
SInt32 summedValue = RecordedVoiceData[i] + RealTimeData[i];
if(summedValue < 0)
{
OutputData[i] = summedValue + downSideUpValue;
}
else if(summedValue > 0)
{
OutputData[i] = summedValue - upSideDownValue;
}
else
{
OutputData[i] = summedValue;
}
}
return OutputData;
}
it works fine for me, i have later intention gradually change the value of upSideDownValue & downSideUpValue to gain a smoother output.
This question is old but here is the valid method IMO.
You can make the first 2 steps together, but will need the maximum and minimum to normalize in a second pass for step 3 and 4.
I hope it helps someone.