Bug writing audio using custom video writer library

≯℡__Kan透↙ 提交于 2019-12-05 05:54:35

Two suggestions:

  • First, pack the data as short instead of int for the audio format, as per the C++ test. Audio data is 16-bit, not 32-bit. Use the 'h' extension for the packing format. For example, struct.pack(f'{len(samples)}h', *samples).

  • Second, see code modification below. Expose WAVEFORMATX via SWIG, by editing aviwriter.i. Then call writer.SetAudioFormat(wfx) from Python.

  • In my tests, the memset() was not necessary. From python you could manually set the field cbSize to zero, that should be enough. The other six fields are mandatory so you'll be setting them anyways. It looks like this struct isn't meant to be revised in the future, because it does not have a struct size field, and also the semantics of cbSize (appending arbitrary data to the end of the struct) conflict with an extension anyways.

aviwriter.i:

%inline %{
typedef unsigned short WORD;
typedef unsigned long DWORD;
typedef struct tWAVEFORMATEX
{
    WORD    wFormatTag;        /* format type */
    WORD    nChannels;         /* number of channels (i.e. mono, stereo...) */
    DWORD   nSamplesPerSec;    /* sample rate */
    DWORD   nAvgBytesPerSec;   /* for buffer estimation */
    WORD    nBlockAlign;       /* block size of data */
    WORD    wBitsPerSample;    /* Number of bits per sample of mono data */    
    WORD    cbSize;            /* The count in bytes of the size of
                                extra information (after cbSize) */
} WAVEFORMATEX;
%}

test.py:

from aviwriter import WAVEFORMATEX

later in test.py:

    wfx = WAVEFORMATEX()
    wfx.wFormatTag = 1 #WAVE_FORMAT_PCM
    wfx.nChannels = 1
    wfx.nSamplesPerSec = sampleRate
    wfx.nAvgBytesPerSec = sampleRate * 2
    wfx.nBlockAlign = 2
    wfx.wBitsPerSample = 16
    writer.SetAudioFormat(wfx)

Notes on SWIG: Since aviwriter.h only provides a forward declaration of tWAVEFORMATEX, no other information is provided to SWIG, preventing get/set wrappers from being generated. You could ask SWIG to wrap a Windows header declaring the struct ... and open a can of worms because those headers are too large and complex, exposing further problems. Instead, you can individually define WAVEFORMATEX as done above. The C++ types WORD and DWORD still are not declared, though. Including the SWIG file windows.i only creates wrappers which, for example, allow string "WORD" in a Python script file to be understood as indicating 16-bit data in memory. But that doesn't declare the WORD type from a C++ perspective. To resolve this, adding typedefs for WORD and DWORD in this %inline statement in aviwriter.i forces SWIG to copy that code directly inlined into the wrapper C++ file, making the declarations available. This also triggers get/set wrappers to be generated. Alternately, you could include that inlined code inside aviwriter.h if you're willing to edit it.

In short, the idea here is to fully enclose all types into standalone headers or declaration blocks. Remember that .i and .h file have separate functionality (wrappers and data conversion, versus functionality being wrapped). Similarly, notice how aviwriter.h is included twice in the aviwriter.i, once to trigger the generation of wrappers needed for Python, and once to declare types in the generated wrapper code needed for C++.

From what I saw in the code you don't initialize the audio format. This is done in the original test.cpp code by calling writer.SetAudioFormat(&wfx); at line 44, then it is set for mono 44.1 kHz PCM. I believe that since you do not initialize, the blank header is written, and video player is unable to open the unknown format.

Update

As you only need to pass the binary header structure, and you don't need to use the structure and declare it in the aviwriter.i. You can use following code directly from Python:

import struct
from collection import namedtuple

WAVEFORMATEX = namedtuple('WAVEFORMATEX', 'wFormatTag nChannels nSamplesPerSec nAvgBytesPerSec nBlockAlign wBitsPerSample cbSize ')
wfx = WAVEFORMATEX(    
    wFormatTag = 1,
    nChannels = 1,
    nSamplesPerSec = sampleRate,
    nAvgBytesPerSec = sampleRate * 2,
    nBlockAlign = 2,
    wBitsPerSample = 16,
    cbSize = 0)

audio_format_obj = struct.pack('<HHIIHHH', *list(wfx))
writer.SetAudioFormat(audio_format_obj)            

This will automatically solve your second and third concerns.

As for memset(&wfx,0,sizeof(wfx)); this is just an ugly way of old C to zero all variables in the structure.

P.S. As @MichaelsonBritt mentioned, your audio data format have to match the declaration in the header. But instead of converting to 16 bit short, you can declare 2 channels, so you will get stereo sound with one channel silent.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!