Algorithm to mix sound

后端 未结 20 2042
囚心锁ツ
囚心锁ツ 2020-11-29 16:55

I have two raw sound streams that I need to add together. For the purposes of this question, we can assume they are the same bitrate and bit depth (say 16 bit sample, 44.1k

相关标签:
20条回答
  • 2020-11-29 17:15

    I found a new way to add samples in a way in which they can never exceed a given range. The basic Idea is to convert values in a range between -1 to 1 to a range between approximately -Infinity to +Infinity, add everything together and reverse the initial transformation. I came up with the following formulas for this:

    f(x)=-\frac{x}{|x|-1}

    f'(x)=\frac{x}{|x|+1}

    o=f'(\sum f(s))

    I tried it out and it does work, but for multiple loud sounds the resulting audio sounds worse than just adding the samples together and clipping every value which is too big. I used the following code to test this:

    #include <math.h>
    #include <stdio.h>
    #include <float.h>
    #include <stddef.h>
    #include <stdint.h>
    #include <string.h>
    #include <stdbool.h>
    #include <sndfile.h>
    
    // fabs wasn't accurate enough
    long double ldabs(long double x){
      return x < 0 ? -x : x;
    }
    
    // -Inf<input<+Inf, -1<=output<=+1
    long double infiniteToFinite( long double sample ){
      // if the input value was too big, we'll just map it to -1 or 1
      if( isinf(sample) )
        return sample < 0 ? -1. : 1.;
      long double ret = sample / ( ldabs(sample) + 1 );
      // Just in case of calculation errors
      if( isnan(ret) )
        ret = sample < 0 ? -1. : 1.;
      if( ret < -1. )
        ret = -1.;
      if( ret > 1. )
        ret = 1.;
      return ret;
    }
    
    // -1<=input<=+1, -Inf<output<+Inf
    long double finiteToInfinite( long double sample ){
      // if out of range, clamp to 1 or -1
      if( sample > 1. )
        sample = 1.;
      if( sample < -1. )
        sample = -1.;
      long double res = -( sample / ( ldabs(sample) - 1. ) );
      // sample was too close to 1 or -1, return largest long double
      if( isinf(res) )
        return sample < 0 ? -LDBL_MAX : LDBL_MAX;
      return res;
    }
    
    // -1<input<1, -1<=output<=1 | Try to avoid input values too close to 1 or -1
    long double addSamples( size_t count, long double sample[] ){
      long double sum = 0;
      while( count-- ){
        sum += finiteToInfinite( sample[count] );
        if( isinf(sum) )
          sum = sum < 0 ? -LDBL_MAX : LDBL_MAX;
      }
      return infiniteToFinite( sum );
    }
    
    #define BUFFER_LEN 256
    
    int main( int argc, char* argv[] ){
    
      if( argc < 3 ){
        fprintf(stderr,"Usage: %s output.wav input1.wav [input2.wav...]\n",*argv);
        return 1;
      }
    
      {
        SNDFILE *outfile, *infiles[argc-2];
        SF_INFO sfinfo;
        SF_INFO sfinfo_tmp;
    
        memset( &sfinfo, 0, sizeof(sfinfo) );
    
        for( int i=0; i<argc-2; i++ ){
          memset( &sfinfo_tmp, 0, sizeof(sfinfo_tmp) );
          if(!( infiles[i] = sf_open( argv[i+2], SFM_READ, &sfinfo_tmp ) )){
            fprintf(stderr,"Could not open file: %s\n",argv[i+2]);
            puts(sf_strerror(0));
            goto cleanup;
          }
          printf("Sample rate %d, channel count %d\n",sfinfo_tmp.samplerate,sfinfo_tmp.channels);
          if( i ){
            if( sfinfo_tmp.samplerate != sfinfo.samplerate
             || sfinfo_tmp.channels != sfinfo.channels
            ){
              fprintf(stderr,"Mismatching sample rate or channel count\n");
              goto cleanup;
            }
          }else{
            sfinfo = sfinfo_tmp;
          }
          continue;
          cleanup: {
            while(i--)
              sf_close(infiles[i]);
            return 2;
          }
        }
    
        if(!( outfile = sf_open(argv[1], SFM_WRITE, &sfinfo) )){
          fprintf(stderr,"Could not open file: %s\n",argv[1]);
          puts(sf_strerror(0));
          for( int i=0; i<argc-2; i++ )
            sf_close(infiles[i]);
          return 3;
        }
    
        double inbuffer[argc-2][BUFFER_LEN];
        double outbuffer[BUFFER_LEN];
    
        size_t max_read;
        do {
          max_read = 0;
          memset(outbuffer,0,BUFFER_LEN*sizeof(double));
          for( int i=0; i<argc-2; i++ ){
            memset( inbuffer[i], 0, BUFFER_LEN*sizeof(double) );
            size_t read_count = sf_read_double( infiles[i], inbuffer[i], BUFFER_LEN );
            if( read_count > max_read )
              max_read = read_count;
          }
          long double insamples[argc-2];
          for( size_t j=0; j<max_read; j++ ){
            for( int i=0; i<argc-2; i++ )
              insamples[i] = inbuffer[i][j];
            outbuffer[j] = addSamples( argc-2, insamples );
          }
          sf_write_double( outfile, outbuffer, max_read );
        } while( max_read );
    
        sf_close(outfile);
        for( int i=0; i<argc-2; i++ )
          sf_close(infiles[i]);
      }
    
      return 0;
    }
    
    0 讨论(0)
  • 2020-11-29 17:16

    "Quieter by half" isn't quite correct. Because of the ear's logarithmic response, dividing the samples in half will make it 6-db quieter - certainly noticeable, but not disastrous.

    You might want to compromise by multiplying by 0.75. That will make it 3-db quieter, but will lessen the chance of overflow and also lessen the distortion when it does happen.

    0 讨论(0)
  • 2020-11-29 17:17

    I'd say just add them together. If you're overflowing your 16 bit PCM space, then the sounds you're using are already incredibly loud to begin with and you should attenuate them. If that would cause them to be too soft by themselves, look for another way of increasing the overall volume output, such as an OS setting or turning the knob on your speakers.

    0 讨论(0)
  • 2020-11-29 17:19

    convert the samples to floating point values ranging from -1.0 to +1.0, then:

    out = (s1 + s2) - (s1 * s2);

    Will introduce heavy distortion when |s1 + s2| approach 1.0 (at least when I tried it when mixing simple sine waves). I read this recommendation on several locations, but in my humble opinion, it is a useless approach.

    What happens physically when waves 'mix' is that their amplitutes add, just like many of the posters here suggested already. Either

    • clip (distorts the result as well) or
    • summarize your 16 bit values into a 32 bit number, and then divide by the number of your sources (that's what I would suggest as it's the only way known to me avoiding distortions)
    0 讨论(0)
  • 2020-11-29 17:21

    Thank you everyone for sharing your ideas, recently i'm also doing some work related to sound mixing. I'm also have done experimenting thing on this issue, may it help you guys :).

    Note that i'm using 8Khz sample rate & 16 bit sample (SInt16) sound in ios RemoteIO AudioUnit.

    Along my experiments the best result i found was something different from all this answer, but the basic is the same (As Roddy suggest)

    "You should add them together, but clip the result to the allowable range to prevent over/underflow".

    But what should be the best way to adding without overflow/underflow ?

    Key Idea:: You have two sound wave say A & B, and the resultant wave C will the superposition of two wave A & B. Sample under limited bit range may cause it to overflow. So now we can calculate the maximum limit cross at the upside & minimum limit cross at the downside of the superposition wave form. Now we will subtract maximum upside limit cross to the upper portion of the superposition wave form and add minimum downside limit cross to the lower portion of the superposition wave form. VOILA ... you are done.

    Steps:

    1. First traverse your data loop once for the maximum value of upper limit cross & minimum value of lower limit cross.
    2. Make another traversal to the audio data, subtract the maximum value from the positive audio data portion and add minimum value to the negative portion of audio data.

    the following code would show the implementation.

    static unsigned long upSideDownValue = 0;
    static unsigned long downSideUpValue = 0;
    #define SINT16_MIN -32768
    #define SINT16_MAX 32767
    SInt16* mixTwoVoice (SInt16* RecordedVoiceData, SInt16* RealTimeData, SInt16 *OutputData, unsigned int dataLength){
    
    unsigned long tempDownUpSideValue = 0;
    unsigned long tempUpSideDownValue = 0;
    //calibrate maker loop
    for(unsigned int i=0;i<dataLength ; i++)
    {
        SInt32 summedValue = RecordedVoiceData[i] + RealTimeData[i];
    
        if(SINT16_MIN < summedValue && summedValue < SINT16_MAX)
        {
            //the value is within range -- good boy
        }
        else
        {
           //nasty calibration needed
            unsigned long tempCalibrateValue;
            tempCalibrateValue = ABS(summedValue) - SINT16_MIN; // here an optimization comes ;)
    
            if(summedValue < 0)
            {
                //check the downside -- to calibrate
                if(tempDownUpSideValue < tempCalibrateValue)
                    tempDownUpSideValue = tempCalibrateValue;
            }
            else
            {
                //check the upside ---- to calibrate
                if(tempUpSideDownValue < tempCalibrateValue)
                    tempUpSideDownValue = tempCalibrateValue;
            }
        }
    }
    
    //here we need some function which will gradually set the value
    downSideUpValue = tempUpSideDownValue;
    upSideDownValue = tempUpSideDownValue;
    
    //real mixer loop
    for(unsigned int i=0;i<dataLength;i++)
    {
        SInt32 summedValue = RecordedVoiceData[i] + RealTimeData[i];
    
        if(summedValue < 0)
        {
            OutputData[i] = summedValue + downSideUpValue;
        }
        else if(summedValue > 0)
        {
            OutputData[i] = summedValue - upSideDownValue;
        }
        else
        {
            OutputData[i] = summedValue;
        }
    }
    
    return OutputData;
    }
    

    it works fine for me, i have later intention gradually change the value of upSideDownValue & downSideUpValue to gain a smoother output.

    0 讨论(0)
  • 2020-11-29 17:21

    This question is old but here is the valid method IMO.

    1. Convert both sample in power.
    2. Add both sample in power.
    3. Normalize it. Such as the maximum value doesn't go over your limit.
    4. Convert back in amplitude.

    You can make the first 2 steps together, but will need the maximum and minimum to normalize in a second pass for step 3 and 4.

    I hope it helps someone.

    0 讨论(0)
提交回复
热议问题