Algorithm to mix sound

后端未结

关注

 20  2078

I have two raw sound streams that I need to add together. For the purposes of this question, we can assume they are the same bitrate and bit depth (say 16 bit sample, 44.1k

相关标签:

20条回答

抹茶落季

2020-11-29 17:15

I found a new way to add samples in a way in which they can never exceed a given range. The basic Idea is to convert values in a range between -1 to 1 to a range between approximately -Infinity to +Infinity, add everything together and reverse the initial transformation. I came up with the following formulas for this:

$f(x)=-\frac{x}{|x|-1}$

$f'(x)=\frac{x}{|x|+1}$

$o=f'(\sum f(s))$

I tried it out and it does work, but for multiple loud sounds the resulting audio sounds worse than just adding the samples together and clipping every value which is too big. I used the following code to test this:

#include <math.h>
#include <stdio.h>
#include <float.h>
#include <stddef.h>
#include <stdint.h>
#include <string.h>
#include <stdbool.h>
#include <sndfile.h>

// fabs wasn't accurate enough
long double ldabs(long double x){
  return x < 0 ? -x : x;
}

// -Inf<input<+Inf, -1<=output<=+1
long double infiniteToFinite( long double sample ){
  // if the input value was too big, we'll just map it to -1 or 1
  if( isinf(sample) )
    return sample < 0 ? -1. : 1.;
  long double ret = sample / ( ldabs(sample) + 1 );
  // Just in case of calculation errors
  if( isnan(ret) )
    ret = sample < 0 ? -1. : 1.;
  if( ret < -1. )
    ret = -1.;
  if( ret > 1. )
    ret = 1.;
  return ret;
}

// -1<=input<=+1, -Inf<output<+Inf
long double finiteToInfinite( long double sample ){
  // if out of range, clamp to 1 or -1
  if( sample > 1. )
    sample = 1.;
  if( sample < -1. )
    sample = -1.;
  long double res = -( sample / ( ldabs(sample) - 1. ) );
  // sample was too close to 1 or -1, return largest long double
  if( isinf(res) )
    return sample < 0 ? -LDBL_MAX : LDBL_MAX;
  return res;
}

// -1<input<1, -1<=output<=1 | Try to avoid input values too close to 1 or -1
long double addSamples( size_t count, long double sample[] ){
  long double sum = 0;
  while( count-- ){
    sum += finiteToInfinite( sample[count] );
    if( isinf(sum) )
      sum = sum < 0 ? -LDBL_MAX : LDBL_MAX;
  }
  return infiniteToFinite( sum );
}

#define BUFFER_LEN 256

int main( int argc, char* argv[] ){

  if( argc < 3 ){
    fprintf(stderr,"Usage: %s output.wav input1.wav [input2.wav...]\n",*argv);
    return 1;
  }

  {
    SNDFILE *outfile, *infiles[argc-2];
    SF_INFO sfinfo;
    SF_INFO sfinfo_tmp;

    memset( &sfinfo, 0, sizeof(sfinfo) );

    for( int i=0; i<argc-2; i++ ){
      memset( &sfinfo_tmp, 0, sizeof(sfinfo_tmp) );
      if(!( infiles[i] = sf_open( argv[i+2], SFM_READ, &sfinfo_tmp ) )){
        fprintf(stderr,"Could not open file: %s\n",argv[i+2]);
        puts(sf_strerror(0));
        goto cleanup;
      }
      printf("Sample rate %d, channel count %d\n",sfinfo_tmp.samplerate,sfinfo_tmp.channels);
      if( i ){
        if( sfinfo_tmp.samplerate != sfinfo.samplerate
         || sfinfo_tmp.channels != sfinfo.channels
        ){
          fprintf(stderr,"Mismatching sample rate or channel count\n");
          goto cleanup;
        }
      }else{
        sfinfo = sfinfo_tmp;
      }
      continue;
      cleanup: {
        while(i--)
          sf_close(infiles[i]);
        return 2;
      }
    }

    if(!( outfile = sf_open(argv[1], SFM_WRITE, &sfinfo) )){
      fprintf(stderr,"Could not open file: %s\n",argv[1]);
      puts(sf_strerror(0));
      for( int i=0; i<argc-2; i++ )
        sf_close(infiles[i]);
      return 3;
    }

    double inbuffer[argc-2][BUFFER_LEN];
    double outbuffer[BUFFER_LEN];

    size_t max_read;
    do {
      max_read = 0;
      memset(outbuffer,0,BUFFER_LEN*sizeof(double));
      for( int i=0; i<argc-2; i++ ){
        memset( inbuffer[i], 0, BUFFER_LEN*sizeof(double) );
        size_t read_count = sf_read_double( infiles[i], inbuffer[i], BUFFER_LEN );
        if( read_count > max_read )
          max_read = read_count;
      }
      long double insamples[argc-2];
      for( size_t j=0; j<max_read; j++ ){
        for( int i=0; i<argc-2; i++ )
          insamples[i] = inbuffer[i][j];
        outbuffer[j] = addSamples( argc-2, insamples );
      }
      sf_write_double( outfile, outbuffer, max_read );
    } while( max_read );

    sf_close(outfile);
    for( int i=0; i<argc-2; i++ )
      sf_close(infiles[i]);
  }

  return 0;
}

0 讨论(0)

清歌不尽

2020-11-29 17:16

"Quieter by half" isn't quite correct. Because of the ear's logarithmic response, dividing the samples in half will make it 6-db quieter - certainly noticeable, but not disastrous.

You might want to compromise by multiplying by 0.75. That will make it 3-db quieter, but will lessen the chance of overflow and also lessen the distortion when it does happen.

0 讨论(0)
发布评论:

提交评论
- 加载中...
广开言路

2020-11-29 17:17

I'd say just add them together. If you're overflowing your 16 bit PCM space, then the sounds you're using are already incredibly loud to begin with and you should attenuate them. If that would cause them to be too soft by themselves, look for another way of increasing the overall volume output, such as an OS setting or turning the knob on your speakers.

0 讨论(0)
发布评论:

提交评论
- 加载中...
渐次进展

2020-11-29 17:19
convert the samples to floating point values ranging from -1.0 to +1.0, then:

out = (s1 + s2) - (s1 * s2);

Will introduce heavy distortion when |s1 + s2| approach 1.0 (at least when I tried it when mixing simple sine waves). I read this recommendation on several locations, but in my humble opinion, it is a useless approach.

What happens physically when waves 'mix' is that their amplitutes add, just like many of the posters here suggested already. Either
- clip (distorts the result as well) or
- summarize your 16 bit values into a 32 bit number, and then divide by the number of your sources (that's what I would suggest as it's the only way known to me avoiding distortions)
0 讨论(0)
发布评论:

提交评论
- 加载中...

广开言路

2020-11-29 17:21

Thank you everyone for sharing your ideas, recently i'm also doing some work related to sound mixing. I'm also have done experimenting thing on this issue, may it help you guys :).

Note that i'm using 8Khz sample rate & 16 bit sample (SInt16) sound in ios RemoteIO AudioUnit.

Along my experiments the best result i found was something different from all this answer, but the basic is the same (As Roddy suggest)

"You should add them together, but clip the result to the allowable range to prevent over/underflow".

But what should be the best way to adding without overflow/underflow ?

Key Idea:: You have two sound wave say A & B, and the resultant wave C will the superposition of two wave A & B. Sample under limited bit range may cause it to overflow. So now we can calculate the maximum limit cross at the upside & minimum limit cross at the downside of the superposition wave form. Now we will subtract maximum upside limit cross to the upper portion of the superposition wave form and add minimum downside limit cross to the lower portion of the superposition wave form. VOILA ... you are done.

Steps:

First traverse your data loop once for the maximum value of upper limit cross & minimum value of lower limit cross.
Make another traversal to the audio data, subtract the maximum value from the positive audio data portion and add minimum value to the negative portion of audio data.

the following code would show the implementation.

static unsigned long upSideDownValue = 0;
static unsigned long downSideUpValue = 0;
#define SINT16_MIN -32768
#define SINT16_MAX 32767
SInt16* mixTwoVoice (SInt16* RecordedVoiceData, SInt16* RealTimeData, SInt16 *OutputData, unsigned int dataLength){

unsigned long tempDownUpSideValue = 0;
unsigned long tempUpSideDownValue = 0;
//calibrate maker loop
for(unsigned int i=0;i<dataLength ; i++)
{
    SInt32 summedValue = RecordedVoiceData[i] + RealTimeData[i];

    if(SINT16_MIN < summedValue && summedValue < SINT16_MAX)
    {
        //the value is within range -- good boy
    }
    else
    {
       //nasty calibration needed
        unsigned long tempCalibrateValue;
        tempCalibrateValue = ABS(summedValue) - SINT16_MIN; // here an optimization comes ;)

        if(summedValue < 0)
        {
            //check the downside -- to calibrate
            if(tempDownUpSideValue < tempCalibrateValue)
                tempDownUpSideValue = tempCalibrateValue;
        }
        else
        {
            //check the upside ---- to calibrate
            if(tempUpSideDownValue < tempCalibrateValue)
                tempUpSideDownValue = tempCalibrateValue;
        }
    }
}

//here we need some function which will gradually set the value
downSideUpValue = tempUpSideDownValue;
upSideDownValue = tempUpSideDownValue;

//real mixer loop
for(unsigned int i=0;i<dataLength;i++)
{
    SInt32 summedValue = RecordedVoiceData[i] + RealTimeData[i];

    if(summedValue < 0)
    {
        OutputData[i] = summedValue + downSideUpValue;
    }
    else if(summedValue > 0)
    {
        OutputData[i] = summedValue - upSideDownValue;
    }
    else
    {
        OutputData[i] = summedValue;
    }
}

return OutputData;
}

it works fine for me, i have later intention gradually change the value of upSideDownValue & downSideUpValue to gain a smoother output.

0 讨论(0)

温柔的废话

2020-11-29 17:21
This question is old but here is the valid method IMO.
1. Convert both sample in power.
2. Add both sample in power.
3. Normalize it. Such as the maximum value doesn't go over your limit.
4. Convert back in amplitude.
You can make the first 2 steps together, but will need the maximum and minimum to normalize in a second pass for step 3 and 4.

I hope it helps someone.
0 讨论(0)
发布评论:

提交评论
- 加载中...