Windows Media Foundation using IMFTransform to decode mp4 movie frames to 2D textures

I'm trying to decode an mp4 video using Windows Media Foundation classes and converting frames in to 2D textures that can be used by a DirectX shader for rendering. I've been able to read the source stream using MFCreateSourceReaderfromURL and been able to read the media type of the stream which has its major type MFMEdiaType_Video and minor type as MFVideoFormat_H264 as expected.

I'm now needing to convert this format in to an RGB format that could be used to initialise a D3D11_TEXTURE2D resource and resource view which can then be passed to a HLSL pixel shader for sampling. I've tired using the IMFTransform class to do the conversion for me but when I try to set the output type on the transform to any MFVideoFormat_RGB variant I get an error. I've also tried setting a new output type on the source reader and just Sampling that hoping to get a sample in the correct format but again I've had no luck.

So my questions would be:

Is this type of conversion possible?
Can this be done through the IMFTransform/SourceReader classes like I've tired above and do I just need to tweak the code or do I need to do this type of conversion manually?
Is this the best way to go about feeding video texture data in to a shader for sampling or is there an easier alternative that I've not thought about.

The OS being used is Windows 7 so I can't use the SourceReaderEx or ID3D11VideoDevice interface because as far as I'm aware these solutions only seem available on Windows 8.

Any help/pointers in the right direction would be greatly appreciated, I can also provide some source code if necessary.

I see that you have some mistake in understanding of Media Foundation. You want get image in RGB format from MFVideoFormat_H264, but you do not use decoder H264. You wrote "I've tired using the IMFTransform class" - IMFTransform is not class. It is interface for Transform COM objects. You must create COM object Media Foundation H264 decoder. The CLSID for the Microsoft software H264 decoder is CLSID_CMSH264DecoderMFT. However, from that decoder you can get output image in the next formats: Output Types

MFVideoFormat_I420

MFVideoFormat_IYUV

MFVideoFormat_NV12

MFVideoFormat_YUY2

MFVideoFormat_YV12

You can create D3D11_TEXTURE2D from one of them. Or you can do something like this from my project CaptureManager SDK:

                CComPtrCustom<IMFTransform> lColorConvert;

                if (!Result(lColorConvert.CoCreateInstance(__uuidof(CColorConvertDMO))))
                {
                    lresult = MediaFoundationManager::setInputType(
                        lColorConvert,
                        0,
                        lVideoMediaType,
                        0);

                    if (lresult)
                    {
                        break;
                    }

                    DWORD lTypeIndex = 0;

                    while (!lresult)
                    {

                        CComPtrCustom<IMFMediaType> lOutputType;

                        lresult = lColorConvert->GetOutputAvailableType(0, lTypeIndex++, &lOutputType);

                        if (!lresult)
                        {


                            lresult = MediaFoundationManager::getGUID(
                                lOutputType,
                                MF_MT_SUBTYPE,
                                lSubType);

                            if (lresult)
                            {
                                break;
                            }

                            if (lSubType == MFVideoFormat_RGB32)
                            {
                                LONG lstride = 0;

                                MediaFoundationManager::getStrideForBitmapInfoHeader(
                                    lSubType,
                                    lWidth,
                                    lstride);

                                if (lstride < 0)
                                    lstride = -lstride;

                                lBitRate = (lHight * (UINT32)lstride * 8 * lNumerator) / lDenominator;

                                lresult = MediaFoundationManager::setUINT32(
                                    lOutputType,
                                    MF_MT_AVG_BITRATE,
                                    lBitRate);

                                if (lresult)
                                {
                                    break;
                                }


                                PROPVARIANT lVarItem;

                                lresult = MediaFoundationManager::getItem(
                                    *aPtrPtrInputMediaType,
                                    MF_MT_FRAME_RATE,
                                    lVarItem);

                                if (lresult)
                                {
                                    break;
                                }

                                lresult = MediaFoundationManager::setItem(
                                    lOutputType,
                                    MF_MT_FRAME_RATE,
                                    lVarItem);

                                if (lresult)
                                {
                                    break;
                                }

                                (*aPtrPtrInputMediaType)->Release();

                                *aPtrPtrInputMediaType = lOutputType.detach();

                                break;
                            }
                        }
                    }
                }

You can set ColorConvertDMO for converting from output format of the H264 decoder into the needed one of you.

Also, you can view code by link: videoInput. This code takes live video from web cam and decode it into the RGB. If you replace web cam source on mp4 video file source you will get the solution which is close to your need.

Regards

Is this type of conversion possible?

Yes it is possible. Stock H.264 Video Decoder MFT is "Direct3D aware" which means it can decode video into Direct3D 9 surfaces/Direct3D 11 textures leveraging DXVA. Or, if hardware capabilities are insufficient there is a software fallback mode too. You are interested in getting the output delivered right into texture for performance reasons (otherwise you would have to load this data yourself spending CPU and video resources on that).

Can this be done through the IMFTransform/SourceReader classes like I've tired above and do I just need to tweak the code or do I need to do this type of conversion manually?

IMFTransform is abstract interface. It is implemented by H.264 decoder (as well as other MFTs) and you can use it directly, or you can use higher level Source Reader API to get it manage video reading from file and decoding using this MFT.

That is, MFT and Source Reader are not actually exclusive alternate option but instead a higher and lower level APIs. MFT interface is offered by decoder and you are responsible to feed H.264 in and drain the decoded output. Source Reader manages the same MFT and adds file reading capability.

Source Reader itself is available in Windows 7, BTW (even on Vista, might be limited in feature set compared to newer OSes though).

Decoding can be executed by the next code:

                    MFT_OUTPUT_DATA_BUFFER loutputDataBuffer;

                    initOutputDataBuffer(
                        lTransform,
                        loutputDataBuffer);

                    DWORD lprocessOutputStatus = 0;

                    lresult = lTransform->ProcessOutput(
                        0,
                        1,
                        &loutputDataBuffer,
                        &lprocessOutputStatus);

                    if ((HRESULT)lresult == E_FAIL)
                    {
                        break;
                    }

function initOutputDataBuffer allocates the needed memory. Example of that function is presented there:

            Result initOutputDataBuffer(IMFTransform* aPtrTransform,
            MFT_OUTPUT_DATA_BUFFER& aRefOutputBuffer)
        {
            Result lresult;

            MFT_OUTPUT_STREAM_INFO loutputStreamInfo;

            DWORD loutputStreamId = 0;

            CComPtrCustom<IMFSample> lOutputSample;

            CComPtrCustom<IMFMediaBuffer> lMediaBuffer;

            do
            {
                if (aPtrTransform == nullptr)
                {
                    lresult = E_POINTER;

                    break;
                }

                ZeroMemory(&loutputStreamInfo, sizeof(loutputStreamInfo));

                ZeroMemory(&aRefOutputBuffer, sizeof(aRefOutputBuffer));

                lresult = aPtrTransform->GetOutputStreamInfo(loutputStreamId, &loutputStreamInfo);

                if (lresult)
                {
                    break;
                }

                if ((loutputStreamInfo.dwFlags & MFT_OUTPUT_STREAM_PROVIDES_SAMPLES) == 0 &&
                    (loutputStreamInfo.dwFlags & MFT_OUTPUT_STREAM_CAN_PROVIDE_SAMPLES) == 0)
                {
                    lresult = MFCreateSample(&lOutputSample);

                    if (lresult)
                    {
                        break;
                    }

                    lresult = MFCreateMemoryBuffer(loutputStreamInfo.cbSize, &lMediaBuffer);

                    if (lresult)
                    {
                        break;
                    }

                    lresult = lOutputSample->AddBuffer(lMediaBuffer);

                    if (lresult)
                    {
                        break;
                    }

                    aRefOutputBuffer.pSample = lOutputSample.Detach();
                }
                else
                {
                    lresult = S_OK;
                }

                aRefOutputBuffer.dwStreamID = loutputStreamId;
            } while (false);

            return lresult;
        }

It needs get information about output samples via GetOutputStreamInfo method of IMFTransform. MFT_OUTPUT_STREAM_INFO contains info about the needed size of memory for output media sample - cbSize. It needs to allocate memory with that size, adds it into the MediaSample and attaches it to th MFT_OUTPUT_DATA_BUFFER.

So, you see that writing code for encoding and decoding video via direct calling of the MediaFoundation function can be difficult and needs significant knowledge about it. From description of you task I see that you need only decode video and present it. I can advise you try use Media Foundation Session functionality. It is developed by engineers of Microsoft and already includes algorithms for using of the needed encoders and optimized. In project videoInput Media Foundation Session is used for finding the suitable decoder for Media Source which is created for web camera and grabbing of the frames in uncompressed format. It is already do the needed processing. You need only replace Media Source from web camera on Media Source from video file. It could by more easy then writing code with direct calling of IMFTransform for decoding and allows to simplify many problems (for example - stabilizing of frame rate. If code will render image immediately after decoding and then decode new frame then it can render 1 minutes video clip during a couple seconds, or if rendering of video and other content can take more than one frame duration video can be presented in "Slow motion" style and rendering of the 1 minute video clip can take 2, 3 or 5 minutes. I do not know for what project you need decoding of video, but you should have serious reasons for using code with direct calling of the Media Foundation functions and interfaces.

Regards.

来源：https://stackoverflow.com/questions/37461426/windows-media-foundation-using-imftransform-to-decode-mp4-movie-frames-to-2d-tex

标签

video

windows-7

mp4

directx-11

ms-media-foundation