Bug writing audio using custom video writer library

I'm trying to wrap a little handy piece of C++ code that is intended to generate video+audio on windows using VFW, the C++ library lives here and the descriptions says:

Uses Video for Windows (so it's not portable). Handy if you want to quickly record a video somewhere and don't feel like wading through the VfW docs yourself.

I'd like to use that C++ library on Python so I've decided to wrap it up using swig.

Thing is, I'm having some problems when it comes to encode the audio, for some reason I'm trying to understand why the generated video is broken, it seems the audio has not been written properly in the video file. That means, if I try to open the video with VLC or any similar video player I'll get a message saying the video player can't identify the audio or video codec. The video images are fine so it's definitely a problem with the way I'm writing the audio to the file.

I'm attaching both the swig interface and a little Python test that's trying to be a port of the original c++ test.

aviwriter.i

%module aviwriter

%{
#include "aviwriter.h"
%}

%typemap(in) (const unsigned char* buffer) (char* buffer, Py_ssize_t length) %{
  if(PyBytes_AsStringAndSize($input,&buffer,&length) == -1)
    SWIG_fail;
  $1 = (unsigned char*)buffer;
%}

%typemap(in) (const void* buffer) (char* buffer, Py_ssize_t length) %{
  if(PyBytes_AsStringAndSize($input,&buffer,&length) == -1)
    SWIG_fail;
  $1 = (void*)buffer;
%}


%include "aviwriter.h"

test.py

import argparse
import sys
import struct
from distutils.util import strtobool

from aviwriter import AVIWriter


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("-audio", action="store", default="1")
    parser.add_argument('-width', action="store",
                        dest="width", type=int, default=400)
    parser.add_argument('-height', action="store",
                        dest="height", type=int, default=300)
    parser.add_argument('-numframes', action="store",
                        dest="numframes", type=int, default=256)
    parser.add_argument('-framerate', action="store",
                        dest="framerate", type=int, default=60)
    parser.add_argument('-output', action="store",
                        dest="output", type=str, default="checker.avi")

    args = parser.parse_args()

    audio = strtobool(args.audio)
    framerate = args.framerate
    num_frames = args.numframes
    width = args.width
    height = args.height
    output = args.output

    writer = AVIWriter()

    if not writer.Init(output, framerate):
        print("Couldn't open video file!")
        sys.exit(1)

    writer.SetSize(width, height)

    data = [0]*width*height
    sampleRate = 44100
    samples_per_frame = 44100 / framerate
    samples = [0]*int(samples_per_frame)

    c1, s1, f1 = 24000.0, 0.0, 0.03
    c2, s2, f2 = 1.0, 0.0, 0.0013

    for frame in range(num_frames):
        print(f"frame {frame}")

        i = 0
        for y in range(height):
            for x in range(width):
                on = ((x + frame) & 32) ^ ((y+frame) & 32)
                data[i] = 0xffffffff if on else 0xff000000
                i += 1
        writer.WriteFrame(
            struct.pack(f'{len(data)}L', *data),
            width*4
        )

        if audio:
            for i in range(int(samples_per_frame)):
                c1 -= f1*s1
                s1 += f1*c1
                c2 += f2*s2
                s2 -= f2*c2

                val = s1 * (0.75 + 0.25 * c2)
                if(frame == num_frames - 1):
                    val *= 1.0 * (samples_per_frame - 1 - i) / \
                        samples_per_frame
                samples[i] = int(val)

                if frame==0:
                    print(f"i={i} val={int(val)}")

            writer.WriteAudioFrame(
                struct.pack(f'{len(samples)}i', *samples),
                int(samples_per_frame)
            )

    writer.Exit()

I don't think samples is being generated incorrectly as I've already compared the values generated on the python side with the values generated on the c++ side, just the packet written for frame 0 though.

Some of my suspicions about what's wrong is the way I've created the typemap on swig, maybe that's not good... or maybe the problem lives in the line writer.WriteAudioFrame(struct.pack(f'{len(samples)}i', *samples), int(samples_per_frame)), I don't know what could be, definitely the way I'm sending the audio buffer from Python to the C++ wrapper is not good.

So, would you know how to fix the attached code so test.py will be able to generate a video with the right audio similarly to the c++ test?

When generated ok, the video will display a magic scrolling checkerboard with hypnotic sinewaves as audio backdrop :D

Additional notes:

1) It seems the above code is not using writer.SetAudioFormat wich is needed for the functions AVIFileCreateStreamA and AVIStreamSetFormat. Problem is I don't know how to export this structure on swig, that way I'd be able to use it on Python the same way than test.cpp, from Mmreg.h I've seen the structure looks like this:

typedef struct tWAVEFORMATEX
{
    WORD    wFormatTag;        /* format type */
    WORD    nChannels;         /* number of channels (i.e. mono, stereo...) */
    DWORD   nSamplesPerSec;    /* sample rate */
    DWORD   nAvgBytesPerSec;   /* for buffer estimation */
    WORD    nBlockAlign;       /* block size of data */
    WORD    wBitsPerSample;    /* Number of bits per sample of mono data */
    WORD    cbSize;            /* The count in bytes of the size of
                                    extra information (after cbSize) */

} WAVEFORMATEX;

Unfortunately I don't know how to wrap that stuff on aviwriter.i? I've tried using %include windows.i and include the stuff directly on a block %{...%} but all I've got were a bunch of errors :/

2) I'd prefer not modifying neither aviwriter.h && aviwriter.cpp at all as that's basically external working code.

3) Assuming I'm able to wrap the WAVEFORMATEX so I can use it on Python, how'd you use memset similarly to test.cpp? ie: memset(&wfx,0,sizeof(wfx));

Two suggestions:

First, pack the data as short instead of int for the audio format, as per the C++ test. Audio data is 16-bit, not 32-bit. Use the 'h' extension for the packing format. For example, struct.pack(f'{len(samples)}h', *samples).
Second, see code modification below. Expose WAVEFORMATX via SWIG, by editing aviwriter.i. Then call writer.SetAudioFormat(wfx) from Python.
In my tests, the memset() was not necessary. From python you could manually set the field cbSize to zero, that should be enough. The other six fields are mandatory so you'll be setting them anyways. It looks like this struct isn't meant to be revised in the future, because it does not have a struct size field, and also the semantics of cbSize (appending arbitrary data to the end of the struct) conflict with an extension anyways.

aviwriter.i:

%inline %{
typedef unsigned short WORD;
typedef unsigned long DWORD;
typedef struct tWAVEFORMATEX
{
    WORD    wFormatTag;        /* format type */
    WORD    nChannels;         /* number of channels (i.e. mono, stereo...) */
    DWORD   nSamplesPerSec;    /* sample rate */
    DWORD   nAvgBytesPerSec;   /* for buffer estimation */
    WORD    nBlockAlign;       /* block size of data */
    WORD    wBitsPerSample;    /* Number of bits per sample of mono data */    
    WORD    cbSize;            /* The count in bytes of the size of
                                extra information (after cbSize) */
} WAVEFORMATEX;
%}

test.py:

from aviwriter import WAVEFORMATEX

later in test.py:

    wfx = WAVEFORMATEX()
    wfx.wFormatTag = 1 #WAVE_FORMAT_PCM
    wfx.nChannels = 1
    wfx.nSamplesPerSec = sampleRate
    wfx.nAvgBytesPerSec = sampleRate * 2
    wfx.nBlockAlign = 2
    wfx.wBitsPerSample = 16
    writer.SetAudioFormat(wfx)

Notes on SWIG: Since aviwriter.h only provides a forward declaration of tWAVEFORMATEX, no other information is provided to SWIG, preventing get/set wrappers from being generated. You could ask SWIG to wrap a Windows header declaring the struct ... and open a can of worms because those headers are too large and complex, exposing further problems. Instead, you can individually define WAVEFORMATEX as done above. The C++ types WORD and DWORD still are not declared, though. Including the SWIG file windows.i only creates wrappers which, for example, allow string "WORD" in a Python script file to be understood as indicating 16-bit data in memory. But that doesn't declare the WORD type from a C++ perspective. To resolve this, adding typedefs for WORD and DWORD in this %inline statement in aviwriter.i forces SWIG to copy that code directly inlined into the wrapper C++ file, making the declarations available. This also triggers get/set wrappers to be generated. Alternately, you could include that inlined code inside aviwriter.h if you're willing to edit it.

In short, the idea here is to fully enclose all types into standalone headers or declaration blocks. Remember that .i and .h file have separate functionality (wrappers and data conversion, versus functionality being wrapped). Similarly, notice how aviwriter.h is included twice in the aviwriter.i, once to trigger the generation of wrappers needed for Python, and once to declare types in the generated wrapper code needed for C++.

From what I saw in the code you don't initialize the audio format. This is done in the original test.cpp code by calling writer.SetAudioFormat(&wfx); at line 44, then it is set for mono 44.1 kHz PCM. I believe that since you do not initialize, the blank header is written, and video player is unable to open the unknown format.

Update

As you only need to pass the binary header structure, and you don't need to use the structure and declare it in the aviwriter.i. You can use following code directly from Python:

import struct
from collection import namedtuple

WAVEFORMATEX = namedtuple('WAVEFORMATEX', 'wFormatTag nChannels nSamplesPerSec nAvgBytesPerSec nBlockAlign wBitsPerSample cbSize ')
wfx = WAVEFORMATEX(    
    wFormatTag = 1,
    nChannels = 1,
    nSamplesPerSec = sampleRate,
    nAvgBytesPerSec = sampleRate * 2,
    nBlockAlign = 2,
    wBitsPerSample = 16,
    cbSize = 0)

audio_format_obj = struct.pack('<HHIIHHH', *list(wfx))
writer.SetAudioFormat(audio_format_obj)

This will automatically solve your second and third concerns.

As for memset(&wfx,0,sizeof(wfx)); this is just an ugly way of old C to zero all variables in the structure.

P.S. As @MichaelsonBritt mentioned, your audio data format have to match the declaration in the header. But instead of converting to 16 bit short, you can declare 2 channels, so you will get stereo sound with one channel silent.

来源：https://stackoverflow.com/questions/50212416/bug-writing-audio-using-custom-video-writer-library

标签

python

c++

windows

audio

swig