Is there a fast way to find (not necessarily recognize) human speech in an audio file?

前端 未结 3 1385
感动是毒
感动是毒 2021-02-01 08:59

I want to write a program that automatically syncs unsynced subtitles. One of the solutions I thought of is to somehow algorythmically find human speech and adjust the subtiles

3条回答
  •  你的背包
    2021-02-01 09:58

    webrtcvad is a Python wrapper around Google's excellent WebRTC Voice Activity Detection (VAD) implementation--it does the best job of any VAD I've used as far as correctly classifying human speech, even with noisy audio.

    To use it for your purpose, you would do something like this:

    1. Convert file to be either 8 KHz or 16 Khz, 16-bit, mono format. This is required by the WebRTC code.
    2. Create a VAD object: vad = webrtcvad.Vad()
    3. Split the audio into 30 millisecond chunks.
    4. Check each chunk to see if it contains speech: vad.is_speech(chunk, sample_rate)

    The VAD output may be "noisy", and if it classifies a single 30 millisecond chunk of audio as speech you don't really want to output a time for that. You probably want to look over the past 0.3 seconds (or so) of audio and see if the majority of 30 millisecond chunks in that period are classified as speech. If they are, then you output the start time of that 0.3 second period as the beginning of speech. Then you do something similar to detect when the speech ends: Wait for a 0.3 second period of audio where the majority of 30 millisecond chunks are not classified as speech by the VAD--when that happens, output the end time as the end of speech.

    You may have to tweak the timing a little bit to get good results for your purposes--maybe you decide that you need 0.2 seconds of audio where more than 30% of chunks are classified as speech by the VAD before you trigger, and 1.0 seconds of audio with more than 50% of chunks classified as non-speech before you de-trigger.

    A ring buffer (collections.deque in Python) is a helpful data structure for keeping track of the last N chunks of audio and their classification.

提交回复
热议问题