DirectShow – Fooling Around

Every so often there are tasks that need certain reference video or video/audio footage with specific properties: resolution, frame rate, frame accuracy with content identifying specific frame, motion in view, amount of motion which is “hard” for processing with encoder tuned for natural video, specific video and audio synchronization.

There is of course some content available, and sometimes it’s really nice:

However once in a while you need 59.94 fps and not 60, and another time you’d go with 50 so that millisecond time is well-aligned and every second has equal number of frames, then next time you need specific aspect ratio override and then you’d prefer longer clip to a short one.

I converted one of my sources for reference signal into DirectShow filters, which might be used to produce infinite signal, or otherwise they might be used to generate a file of specific format with specific properties.

The filters are Reference Video Source and Reference Audio Source, regular filters registered in a separate category (not virtual video/audio source devices – yet?), available for instantiation programmatically or in GraphStudioNext/GraphEdit.

The filters are in both 32- and 64-bit versions, with hardcoded properties (yet?): 1280×720@50 32-bit RGB for video and 16-bit PCM mono at 48 kHz for audio. Programmatically, however, the filters can be tuned flexibly using IAMStreamConfig::Format call:

Video:
- Any resolution
- 32-bit RGB, top-to-bottom only (the filter internally uses Direct2D/WIC to generate the images)
- Any positive frame rate
- Aspect ratio can be overridden using VIDEOINFOHEADER2 format, e.g. to force SD video to be 4:3
Audio:
- Any sample rate
- 16-bit PCM or 32-bit IEEE floating point format
- Mono

Video filter generates image sequence with properties burnt in, with frame number, 100ns time, time with frame number within second, and a circle with a sector filled to reflect current sub-second time. There is Uh/Oh text inside the circle at sharp second times and the background color is in continuous transition between colors.

Audio filter beeps every second during the first 100 ms of a second, with a tone different for every fifth and every tenth second.

Both filters support IAMStreamControl interface, and IAMStreamControl::StopAt method in particular, which allows to limit signal duration and be used for time accurate file creation.

This comes with a sample project that demonstrates ASF file generation for specific properties and duration. Output file generated by the sample is Output.asf.

ASF file format and WM ASF Writer are chosen for code brevity and to reference stock multiplexer. This has a downside that multiplexer re-scales video to profile’s resolution and frame rate, of course. Those interested in generation of their own content would use something like their favorite H.264 and AAC encoders with MP4 or MKV multiplexer perhaps. And a nicer output would look like Output.mp4 then.

A good thing about publishing these filters is that while preparing test project, I hit a thread safety bug in GDCL MP4 multiplexer, which is presumably present in all/most version of the mp4mux.dll out there: if filter graph is stopped at the time of video streaming, before reaching end-of-stream on video leg (which is often the case because upstream connection would be H.264 encoder having internal queue of frames draining then on worker threads processing stop request), multiplexer might generate a memory access violation trying to use NULL track object, which is already gone.

Download links

Binaries:
- 32-bit: DirectShowReferenceSource-Win32.dll
- 64-bit: DirectShowReferenceSource-x64.dll
- Sample project Generate: SVN, Trac
- License: This software is free to use

Logitech’s C930e camera is the first one to be compliant with UVC 1.5 specification:

First 1080p HD webcam to support H.264 with Scalable Video Coding and UVC 1.5 encoding technology. […] The result is a smoother video stream in applications like Skype for Business and Microsoft® Lync® 2013.

More marketing information there at Logitech. More interesting is what the new capabilities look from API side programmatically. Additionally to well known Motion JPEG (FourCC MJPG) and YUY2 video, the camera delivers H.264 (FourCC H264) video.

Logitech C930e Webcam

Lync (Skype for Business) is presumably modified to accept that and it communicates to the camera using Media Foundation API.

The camera’s H.264 capabilities are accessible using both APIs, DirectShow and Media Foundation, and there is apparently a mess with driver versions and operating system versions as well. The best results are achieved with stock driver from Microsoft (without installing Logitech driver, this information is in good standing: “The only way I was able to get that stream under Windows 8.x was by NOT USING LOGITECH DRIVERS. This is a UVC 1.5 compatible camera and it will be configured automatically by the OS. With that driver (from Microsoft), use pin 1 (not 0) and you will get a ton of H264 formats.”).

A printout of DirectShow capabilities using DirectShowCaptureCapabilities is available here (note KS_H264VIDEOINFO structure). This time it is about what it looks when one’s doing Media Foundation.

As a Media Source, exposed are a few attributes and a great deal of media types (216 + 476), greater amount compared to DirectShow as it seems:

MF_DEVSOURCE_ATTRIBUTE_MEDIA_TYPE: 76 69 64 73 00 00 10 00 80 00 00 AA 00 38 9B 71 59 55 59 32 00 00 10 00 80 00 00 AA 00 38 9B 71
MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_SYMBOLIC_LINK: \\?\usb#vid_046d&pid_0843&mi_00#6&2314864d&0&0000#{e5323777-f976-4f5b-9b55-b94699c46e44}\global (Type `VT_LPWSTR`)
MF_DEVSOURCE_ATTRIBUTE_FRIENDLY_NAME: Logitech Webcam C930e (Type `VT_LPWSTR`)
MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_CATEGORY: KSCATEGORY_VIDEO_CAMERA
MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE: MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_GUID
MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_HW_SOURCE: 4 (Type `VT_UI4`)

Characteristics: MFMEDIASOURCE_IS_LIVE | MFMEDIASOURCE_CAN_PAUSE
Stream 0: Default Selected, Identifier 0x0, Major Type MFMediaType_Video, 216 Media Types
Stream 1: Identifier 0x1, Major Type MFMediaType_Video, 476 Media Types

The H.264 formats are marked with subtypes of MFVideoFormat_H264 and MFVideoFormat_H264_ES. A raw print out is downloadable:

Specifically, it is interesting what are the attributes there since with Media Foundation it is tricky thing to find out quickly. The keys/identifiers are listed below.

Common

MF_MT_ALL_SAMPLES_INDEPENDENT
MF_MT_AM_FORMAT_TYPE
MF_MT_AVG_BITRATE
MF_MT_FIXED_SIZE_SAMPLES
MF_MT_FRAME_RATE
MF_MT_FRAME_RATE_RANGE_MAX
MF_MT_FRAME_RATE_RANGE_MIN
MF_MT_FRAME_SIZE
MF_MT_INTERLACE_MODE
MF_MT_MAJOR_TYPE
MF_MT_PIXEL_ASPECT_RATIO
MF_MT_SUBTYPE

`MFVideoFormat_H264`, `MFVideoFormat_H264_ES`

MF_MT_COMPRESSED
MF_MT_H264_CAPABILITIES
MF_MT_H264_MAX_CODEC_CONFIG_DELAY
MF_MT_H264_MAX_MB_PER_SEC
MF_MT_H264_RESOLUTION_SCALING
MF_MT_H264_SIMULCAST_SUPPORT
MF_MT_H264_SUPPORTED_RATE_CONTROL_MODES
MF_MT_H264_SUPPORTED_SLICE_MODES
MF_MT_H264_SUPPORTED_SYNC_FRAME_TYPES
MF_MT_H264_SUPPORTED_USAGES
MF_MT_H264_SVC_CAPABILITIES
MF_MT_VIDEO_LEVEL
MF_MT_VIDEO_PROFILE

`MFVideoFormat_MJPG`

MF_MT_SAMPLE_SIZE
MF_MT_VIDEO_CHROMA_SITING
MF_MT_VIDEO_LIGHTING
MF_MT_VIDEO_NOMINAL_RANGE
MF_MT_VIDEO_PRIMARIES
MF_MT_YUV_MATRIX

`MFVideoFormat_YUY2`

MF_MT_DEFAULT_STRIDE
MF_MT_SAMPLE_SIZE
MF_MT_VIDEO_CHROMA_SITING
MF_MT_VIDEO_LIGHTING
MF_MT_VIDEO_NOMINAL_RANGE
MF_MT_VIDEO_PRIMARIES
MF_MT_YUV_MATRIX

Pulling this out from Blackmagic Design Forum thread:

Generally, the recommended interface to the capture cards is the DeckLink API.

A DirectShow interface is available, but provides a subset of the functionality available from the complete DeckLink API.

Please note that the older, user-space DirectShow filters (DeckLink Video Capture) are deprecated in favour of the WDM filters (Blackmagic WDM Capture).

The WDM filters added support for 4K modes in Desktop Video 10.5+.

So the “Decklink Video Capture” filters that wrap the DeckLink SDK and provide convenient DirectShow interface are at their end of life.

Certainly, the most efficient and flexible way to interface Blackmagic Design hardware is to use their SDK (which is good and easy to use), however it does not give the immediate connectivity to Windows APIs. User mode filters were a good wrapper and provided typical functionality for capture and playback. They had their own issues (e.g. no VideoInfo2 support – interlaced formats treated as progressive and no support for progressive formats that collide with interlaces), also some reported 64-bit versions to be not quite stable.

WDM filters are around for some time, specifically they do offer 32-bit audio capture option which the other filters did not have. From what I remember they are lacking other capabilities availalble through SDK (update – e.g. no timecode support).

Apparently WMD filters do not offer playback option via DirectShow. This is not even mentioning the unfortunate Media Foundation – even though “Blackmagic WDM Render” is somehow around and with a certain luck is listed through MFTEnum:

    Blackmagic WDM Render #3
        MFT_ENUM_HARDWARE_URL_Attribute: \\?\decklink#avstream#5&2db0fd5&1&0000#{65e8773e-8f56-11d0-a3b9-00a0c9223196}\decklinkrender1 (Type VT_LPWSTR)
        MFT_INPUT_TYPES_Attributes: 
            MFMediaType_Video MFVideoFormat_UYVY
            MFMediaType_Video MFVideoFormat_v210
            MFMediaType_Video MFVideoFormat_UYVY
            MFMediaType_Video MFVideoFormat_v210
            MFMediaType_Video MFVideoFormat_UYVY
            MFMediaType_Video MFVideoFormat_v210
            MFMediaType_Video MFVideoFormat_UYVY
            MFMediaType_Video MFVideoFormat_v210
            MFMediaType_Video MFVideoFormat_UYVY
            MFMediaType_Video MFVideoFormat_v210
            MFMediaType_Video FourCC HDYC
            MFMediaType_Video FourCC HDYC
            MFMediaType_Video FourCC HDYC
            MFMediaType_Video FourCC HDYC
            MFMediaType_Video FourCC HDYC
            MFMediaType_Video FourCC HDYC
            MFMediaType_Video FourCC HDYC
            MFMediaType_Video FourCC HDYC
            MFMediaType_Audio MFAudioFormat_PCM
            MFMediaType_Audio MFAudioFormat_PCM
            
        MFT_TRANSFORM_CLSID_Attribute: {8AC3587A-4AE7-42D8-99E0-0A6013EEF90F} (Type VT_CLSID)
        MF_TRANSFORM_FLAGS_Attribute: MFT_ENUM_FLAG_HARDWARE

For more or less serious DirectShow development the best way was and is to wrap the DeckLink SDK with a custom filter and have all options that SDK provides.

There were a few reports that in Windows 10 it is unable to play AVI files, which played fine in earlier versions of Windows, AVI files specifically.

OK, the problem does exist. More to say, the problem exist in Windows component that implements AVI Splitter DirectShow filter. One of the reporters mentioned he had a problem with a DV AVI flie. I build one and it indeed showed the problem:

Playback stops at the same frame every time the filter graph is run. The error is 0x8004020D VFW_E_BUFFER_OVERFLOW “The buffer is not big enough” coming from AVI Splitter’s worker thread. The buffers on the memory allocators look appropriate, so the bug looks related to AVI Splitter implementation details, CBaseMSRWorker class that reads from file and delivers frames downstream.

The problem exists in 32 and 64 bit versions, but not in Media Foundation. With certain luck Microsoft will fix the problem on their side.

MainConcept’s MP4 Demultiplexer in Annex B mode looks, well… slightly excessively broken.

H.264 media type with start codes (H264 FourCC, but here they use legacy subtype informally known as MEDIASUBTYPE_H264_bis) do not require parameter sets as a part of MPEGVIDEOINFO2 structure. If they however decided to provide the NAL units, they have to be RLE encoded, without start codes. MainConcept does it Aneex B way – not good.
Zero BITMAPINFOHEADER::biSize?
BITMAPINFOHEADER::biBitCount of 24 is hardly correct, but it is not fatal
Additionally, they do memory allocator of default capacity of 64K followed by streaming larger samples…

Oh.

Needless to mention that this sort of connection simply has no chances to succeed:

Media Foundation as video capture API is inflexible. At Microsoft – besides standard Media Foundation problems of backward compatibility, availability of developer tools and overall awkwardness – they decided to no longer offer video capture extensibility with Media Foundation. Be happy with MFEnumDeviceSources and don’t go anywhere else. They explain that they already provided support for devices backed by kernel streaming drivers:

Starting in Windows 7, Media Foundation automatically supports audio and video capture devices. For video, the device must provide a kernel streaming (KS) minidriver in the video capture category. Media Foundation uses the PnP path to enumerate the device. For audio, Media Foundation uses the Windows Multimedia Device (MMDevice) API to enumerate audio endpoint devices. If the device meets these criteria, there is no need to implement a custom media source.

The next paragraph there is slyness:

However, you might want to implement a custom media source for some other type of device or other live data source. There are only a few differences between a live source and other media sources.

Indeed, you can implement a custom media source, however you cannot implement a backing object (Media Foundation Transform – see below) that standard media source would use, and you cannot make your own video source discoverable by applications so that a custom video source is a new option for video capture enabled applications using Media Foundation.

Over years developers were eagerly interested in various aspects of video capture om Windows platform using VFW and then DirectShow. Including specifically implementing a virtual camera device, for which Microsoft provided Push Source Filters Sample, which then was extended to popular VCam sample that “publishes” video source device and makes it available to applications enumerating video capture hardware. The latest API, Media Foundation, blocked the opportunity to provide a custom video source.

The interesting thing though is that there is no fundamental problem in allowing such extensibility: just a few pieces are missing.

For starters, MFTEnum enumerates objects in, well, DirectShow’s CLSID_VideoInputDeviceCategory category. This is not documented, but this shows how tightly Media Foundation and DirectShow (and related kernel drivers) are connected.

Category: CLSID_VideoInputDeviceCategory {860BB310-5D01-11D0-BD3B-00A0C911CE86}

Logitech Webcam C930e #0
    MFT_ENUM_HARDWARE_URL_Attribute: \\?\usb#vid_046d&pid_0843&mi_00#6&2314864d&0&0000#{65e8773d-8f56-11d0-a3b9-00a0c9223196}\global (Type VT_LPWSTR)
    MFT_TRANSFORM_CLSID_Attribute: {8AC3587A-4AE7-42D8-99E0-0A6013EEF90F} (Type VT_CLSID)
    MFT_OUTPUT_TYPES_Attributes: 
        MFMediaType_Video MFVideoFormat_YUY2
        MFMediaType_Video MFVideoFormat_MJPG
    MF_TRANSFORM_FLAGS_Attribute: MFT_ENUM_FLAG_HARDWARE

Blackmagic WDM Capture #1
    MFT_ENUM_HARDWARE_URL_Attribute: \\?\decklink#avstream#5&2db0fd5&1&0000#{65e8773d-8f56-11d0-a3b9-00a0c9223196}\decklinkcapture1 (Type VT_LPWSTR)
    MFT_TRANSFORM_CLSID_Attribute: {8AC3587A-4AE7-42D8-99E0-0A6013EEF90F} (Type VT_CLSID)
    MFT_OUTPUT_TYPES_Attributes: 
        MFMediaType_Video MFVideoFormat_UYVY
        MFMediaType_Video MFVideoFormat_v210
        MFMediaType_Video FourCC HDYC
        MFMediaType_Audio MFAudioFormat_PCM
    MF_TRANSFORM_FLAGS_Attribute: MFT_ENUM_FLAG_HARDWARE

Any questions? What MFEnumDeviceSources API does is enumeration in this category, and building device COM objects on top of existing MFTs. Using MFT for video source is actually a smart move. This should have been of course done in DirectShow many years ago, and with DMOs instead of MFTs.

DirectX Media Objects (DMOs) got a compact and powerful form factor. Video and audio source implementation can be nicely put in “zero input one output” DMO and then used by standard objects on top of that. Similarly to DirectShow DMO Wrapper Filter but for source filters. This was never done in DirectShow, unfortunately. In Media Foundation DMOs got their obese brother class: Media Foundation Transform, which is pretty much the same, just bloated.

This time Media Foundation guys implemented their base block, MFT, over video capture hardware items, which APIs like MFEnumDeviceSources and MFCreateDeviceSource picks up and uses on their backyard.

Frontend code activating media source goes inside to enumerate formats right there to the inner MFT, its IMFTransform::GetOutputAvailableType through standard Media Foundation implementation for video device source, mfcore‘s CDeviceSource class.

MyTransform::GetOutputAvailableType(unsigned long nOutputStreamIdentifier, unsigned long nTypeIndex, IMFMediaType * * ppMediaType) Line 1033 C++
mfcore.dll!CDeviceSource::GetDeviceStreamType(unsigned long) Unknown
mfcore.dll!CDeviceSource::CreateStreams(void) Unknown
mfcore.dll!CDeviceSource::CDeviceSource(struct IMFTransform *,struct _GUID,struct IMFAttributes *,long *) Unknown
mfcore.dll!CDeviceSource::CreateInstance(struct IMFTransform *,struct _GUID,struct IMFAttributes *,struct IMFMediaSource * *) Unknown
mfcore.dll!MFCreateDeviceSource() Unknown

Capture of frames takes place on WinRT worker thread via IMFTransform::ProcessOutput:

MyTransform::ProcessOutput(unsigned long nFlags, unsigned long nBufferCount, MFTOUTPUTDATABUFFER * pBuffers, unsigned long * pnStatus) Line 1281 C++
mfcore.dll!CDeviceSource::OnMFTEventReceived(struct IMFAsyncResult *) Unknown
mfcore.dll!CDeviceSource::OnMFTEventReceivedAsyncCallback::Invoke(struct IMFAsyncResult *) Unknown
RTWorkQ.dll!CSerialWorkQueue::QueueItem::ExecuteWorkItem(struct IMFAsyncResult *) Unknown
RTWorkQ.dll!CBaseWorkQueue::HandleConcurrentMMCSSEnter(class CRealTimeState *) Unknown
ntdll.dll!TppWorkpExecuteCallback() Unknown
ntdll.dll!TppWorkerThread() Unknown
kernel32.dll!BaseThreadInitThunk() Unknown
ntdll.dll!RtlUserThreadStart() Unknown

That is, the base building block for video capture in Media Foundation is MFT. Excellent! So do they allow registering your own MFT to provide the applications with a custom video device? Not really. The operation of CDeviceSource and Microsoft’s implementation for the MFT (“Device Proxy MFT”) is based on intimate assumptions between the two, and is not documented. When/if this goes public, we will start implementing virtual cameras the same way we did with good old DirectShow.

Some time ago there were some pictures explaining performance and other properties of software H.264 encoder (x264). At this time, it is a turn of hardware H.264 encoders and more to that, two of them and side by side. Both encoders are nothing new: Intel® Quick Sync Video H.264 Encoder and NVIDIA H.264 Encoder already have been around for a while. Some would say it is already time for H.265 encoders.

Either way, on my test machine both encoders are available without additionally installed software (that is, no need for Intel Media SDK, Nvidia NVENC, redistributable files etc.). Out of the box, Windows 10 offers stock software only encoder, and hardware encoders in form factor of Media Foundation Transform (MFT).

Environment:

OS: Windows 10 Pro
CPU: Intel i7-4790
Video Adapter 1: Intel HD Graphics 4600 (on-board, not connected to monitors)
Video Adapter 2: NVIDIA GeForce GTX 750

It is not convenient or fun to do things with Media Foundation, but good news is that Media Foundation components are well-separable. A wrapper over MFT that converts them into DirectShow filters, make them available to DirectShow where it is already way easier to run various test runs. The pictures below show metrics for encoder defaults (bitrate, profiles and many other options that create a great deal of encoding modes). Still the pictures do show that both encoders are well usable for many scenarios including HD processing, simultaneous data processing etc.

Test runs are as simple as taking reference video source signal of different properties, pushing it through encoder filter and either writing to a file (to inspect the footage) or to Null Renderer Filter to measure performance.

Intel® Quick Sync Video H.264 Encoder produces files like these: 720×480.mp4, 2556×1440.mp4, which are of decent quality (with respect to low bitrate and “hard to handle” background changes). NVIDIA H.264 Encoder produces somewhat better output supposedly by choosing higher bitrate. Either way, both encoders have a number of ways to fine tune the encoding process. Not just bitrate, profile, GOP length, B frame settings but even more sophisticated parameters.

Intel® Quick Sync Video H.264 Encoder MFT

CODECAPI_AVEncCommonRateControlMode: VT_UI4 0, default VT_UI4 0, modifiable // eAVEncCommonRateControlMode_CBR = 0
CODECAPI_AVEncCommonQuality: minimal VT_UI4 0, maximal VT_EMPTY, step VT_EMPTY
CODECAPI_AVEncCommonBufferSize: VT_UI4 3131961357, default VT_UI4 0, modifiable
CODECAPI_AVEncCommonMaxBitRate: default VT_UI4 0
CODECAPI_AVEncCommonMeanBitRate: VT_UI4 3131961357, default VT_UI4 2222000, modifiable
CODECAPI_AVEncCommonQualityVsSpeed: VT_UI4 50, default VT_UI4 50, modifiable
CODECAPI_AVEncH264CABACEnable: modifiable
CODECAPI_AVEncMPVDefaultBPictureCount: VT_UI4 0, default VT_UI4 0, modifiable
CODECAPI_AVEncMPVGOPSize: VT_UI4 128, default VT_UI4 128, modifiable
CODECAPI_AVEncVideoEncodeQP: 
CODECAPI_AVEncVideoForceKeyFrame: VT_UI4 0, default VT_UI4 0, modifiable
CODECAPI_AVLowLatencyMode: VT_BOOL 0, default VT_BOOL 0, modifiable
CODECAPI_AVEncVideoLTRBufferControl: VT_UI4 65536, values { VT_UI4 65536, VT_UI4 65537, VT_UI4 65538, VT_UI4 65539, VT_UI4 65540, VT_UI4 65541, VT_UI4 65542, VT_UI4 65543, VT_UI4 65544, VT_UI4 65545, VT_UI4 65546, VT_UI4 65547, VT_UI4 65548, VT_UI4 65549, VT_UI4 65550, VT_UI4 65551, VT_UI4 65552 }, modifiable
CODECAPI_AVEncVideoMarkLTRFrame: 
CODECAPI_AVEncVideoUseLTRFrame: 
CODECAPI_AVEncVideoEncodeFrameTypeQP: default VT_UI8 111670853658, minimal VT_UI8 0, maximal VT_UI8 219046674483, step VT_UI8 1
CODECAPI_AVEncSliceControlMode: VT_UI4 0, default VT_UI4 2, minimal VT_UI4 2, maximal VT_UI4 2, step VT_UI4 0, modifiable
CODECAPI_AVEncSliceControlSize: VT_UI4 0, minimal VT_UI4 0, maximal VT_UI4 8160, step VT_UI4 1, modifiable
CODECAPI_AVEncVideoMaxNumRefFrame: minimal VT_UI4 0, maximal VT_UI4 16, step VT_UI4 1, modifiable
CODECAPI_AVEncVideoTemporalLayerCount: default VT_UI4 1, minimal VT_UI4 1, maximal VT_UI4 3, step VT_UI4 1, modifiable
CODECAPI_AVEncMPVDefaultBPictureCount: VT_UI4 0, default VT_UI4 0, modifiable

NVIDIA H.264 Encoder MFT

CODECAPI_AVEncCommonRateControlMode: VT_UI4 0
CODECAPI_AVEncCommonQuality: VT_UI4 65
CODECAPI_AVEncCommonBufferSize: VT_UI4 8923353
CODECAPI_AVEncCommonMaxBitRate: VT_UI4 8923353
CODECAPI_AVEncCommonMeanBitRate: VT_UI4 2974451
CODECAPI_AVEncCommonQualityVsSpeed: VT_UI4 33
CODECAPI_AVEncH264CABACEnable: VT_BOOL -1
CODECAPI_AVEncMPVGOPSize: VT_UI4 50
CODECAPI_AVEncVideoEncodeQP: VT_UI8 26
CODECAPI_AVEncVideoForceKeyFrame: 
CODECAPI_AVEncVideoMinQP: VT_UI4 0, minimal VT_UI4 0, maximal VT_UI4 51, step VT_UI4 1
CODECAPI_AVLowLatencyMode: VT_BOOL 0
CODECAPI_AVEncVideoLTRBufferControl: VT_UI4 0, values { VT_I4 65537, VT_I4 65538 }
CODECAPI_AVEncVideoMarkLTRFrame: 
CODECAPI_AVEncVideoUseLTRFrame: 
CODECAPI_AVEncVideoEncodeFrameTypeQP: VT_UI8 111670853658
CODECAPI_AVEncSliceControlMode: VT_UI4 2, minimal VT_UI4 0, maximal VT_UI4 2, step VT_UI4 1
CODECAPI_AVEncSliceControlSize: VT_UI4 0, minimal VT_UI4 0, maximal VT_UI4 3, step VT_UI4 1
CODECAPI_AVEncVideoMaxNumRefFrame: VT_UI4 1, minimal VT_UI4 0, maximal VT_UI4 16, step VT_UI4 1
CODECAPI_AVEncVideoMeanAbsoluteDifference: VT_UI4 0
CODECAPI_AVEncVideoMaxQP: VT_UI4 51, minimal VT_UI4 0, maximal VT_UI4 51, step VT_UI4 1
CODECAPI_AVEncVideoROIEnabled: VT_UI4 0
CODECAPI_AVEncVideoTemporalLayerCount: minimal VT_UI4 1, maximal VT_UI4 3, step VT_UI4 1

Important property of hardware encoder is that even that it does consume some of CPU time, the most of the complexity is offloaded to video hardware. In all single stream test runs, the eight-core CPU was loaded not more than 30% including time required to synthesize the image using WIC and Direct2D and convert it to YUV format using CPU. That is, offloading video encoding to GPU is a convenient way to free CPU for real time video processing applications.

I was mostly interested in how the encoders are in terms of being able to process real time data, esp. so that they are applied to record lengthy sessions. Both encoders appear to be fast enough to crack 1920×1080 HD video at frame rates up to 60 and higher. The test did encoding at highest rate possible and 100% number on the charts corresponds to situation that it took one second to synthesize and encode one second of video no matter what effective CPU/GPU load is. That is, values less than 100% indicate ability to encode video content in real time right away.

Basically, the numbers show that both encoders are fast enough to reliably encode 1080p60 stream.

Looking at it from another standpoint of being able to process two or more H.264 encoding sessions at once, encoder from NVidia has an important limitation of two sessions per system (supposedly related thread – for this or another reason test run with three streams fails).

Both encoders are hardly suitable for reliable encoding of two 1080p60 streams simultaneously (or perhaps some fine tuning might make things faster by choosing appropriate encoding mode). However both look fine for encoding 1080p and lower resolution stream. Clearly, Intel’s encoder can be used to encoder multiple low resolution streams in parallel or mix real time encoding with background encoding (provided that background encoding is throttled to let the real time stream run fast enough). If otherwise real-time encoding is not necessary, both encoders can do the job as well, and with Nvidia the application needs to make sure that only two sessions are running simultaneously, Intel’s encoder can be used in a more flexible way.

Also, Nvidia’s encoder is slightly faster, however Intel’s allow 3+ concurrently encoded stream and also allows to supply RGB input directly without converting to YUV.

There is also Intel® Hardware H265 Encoder MFT available for H.265 encoding, but this is going to be another story some time later.

A scenario which was dropped out from previous post is mixed simultaneous encoding using both hardware encoders. Rationale: Intel QSV encoder might exist as a “free” capability of the motherboard (provided with onboard video adapter), The other one might be available with the video adapter intentionally plugged in (including for other reasons, such as to power dual monitor system etc).

From this standpoint, it might be interesting if one can benefit from using of both encoders.

Two filter graphs are set to produce 60 seconds of 1080p60 video as soon as possible, and are started simultaneously. The chart below show completion time, side by side with those for runs of one and two sessions of each encoder separately.

Informational: in single stream runs CPU load was around 30%, two session runs – around 50%, of which the part that synthesizes and converts the video to compatible MFT input format took 5-6% of CPU time overall. Or, if computed against 60 seconds of CPU time of eight core CPU, the synthesis-and-conversion itself consumed <4% CPU time for one stream, and <7% for dual stream runs.

It appears that the tool was never mentioned before (just mentioned in general software list). The application takes a media file on the input and applies respective DirectShow demultiplexer to list individual media samples.

for MP4 files the application attempts to use GDCL MPEG-4 Demultiplexer first
it is possible to filter a specific track/stream
ability to copy data to clipboard or save to file
drag and drop a file to get it processed

Now the tool has command line mode too:

DirectShowFileMediaSamples-Win32.exe input-path [/no-video] [/no-audio] [output-path]

/no-video – excludes video tracks
/no-audio – excludes audio tracks

Default output path is input path with extension renamed to .TSV. If DirectShowSpy is installed, the file also contains filter graph information used (esp. media types).

For example,

D:\>DirectShowFileMediaSamples-Win32.exe “F:\Media\Ленинград — Экспонат.mp4”

Typical command line use: troubleshooting export/transcoding sessions where on completion you need a textual information about the export to make sure time accuracy of individual samples: start, stop times, gaps etc.

Interactively one can also achieve the same goal using GraphStudioNext‘s built-in Analyzer Filter.

Download links

Binaries:
- 32-bit: DirectShowFileMediaSamples-Win32.exe
- 64-bit: DirectShowFileMediaSamples-x64.exe
- License: This software is free to use

A really nasty problem coming from MainConcept AVC/H.264 SDK Encoder was destroying media streaming pipeline. SDK is somewhat old (9.7.9.5738) and the problem might be already fixed, or might be not. The problem is a good example of how a small bug could become a big pain.

The problem was coming up in 64-bit Release builds only. Win32 build? OK. Debug build where you can step things through? No problem.

The bug materialized in GDCL MP4 Demultiplexer filter streaming (Demultiplexer filter in the pipeline below) generating media samples with incorrect time stamps.

Initial start and stop time are okay, and further go as _I64_MIN (incorrect).

The problem appears to be SSE optimization and x64 calling convention related. This explains why it’s only 64-bit Release build suffering from the issue. MS compiler decided to use XMM7 register for dRate variable in this code fragment:

REFERENCE_TIME tStart, tStop;
double dRate;
m_pParser->GetSeekingParams(&tStart, &tStop, &dRate);

[...]

for(; ; )
{
    [...]

    tSampleStart = REFERENCE_TIME(tSampleStart / dRate);
    tSampleEnd = REFERENCE_TIME(tSampleEnd / dRate);

dRate is the only floating point thing here and it’s clear why the compiler optimized the variable into register: no other floating point activity around.

However sample delivery goes pretty deep into other functions and modules reaching MainConcept H.264 encoder. One of its functions is violating x64 calling convention and does not preserve XMM6+ register values. OOPS! Everything is about working right, but after media sample delivery dRate value is destroyed and further media samples receive incorrect time stamps.

It is not really a problem of MP4 demultiplexer, of course, however media sample delivery might involve a long delivery chain where any violator would break streaming loop. In the same time, it is not really a big expense to de-optimize the floating point math in the demultiplexer for those a few time stamp adjustment operations. A volatile specifier breaks compiler optimization and makes the loop resistant to SSE2 register violators:

// HOTFIX: Volatile specifier is not really necessary here but it fixes a nasty problem with MainConcept AVC SDK violating x64 calling convention;
//         MS compiler might choose to keep dRate in XMM6 register and the value would be destroyed by the violating call leading to incorrect 
//         further streaming (wrong time stamps)
volatile DOUBLE dRate;
m_pParser->GetSeekingParams(&tStart, &tStop, (DOUBLE*) &dRate);

This makes H.264 this build of encoding SDK unstable and the problem is hopefully already fixed. The SDK indeed gave other troubles on specific architectures leading to undefined behavior.

(this is a re-post from StackOverflow)

Virtual webcam is typically a software only implementation that application discover as if it is a device with physical representation. The mentioned applications use APIs to work with web cameras and ability to extend the APIs and add your own video source is the way to create a virtual web camera.

In Windows there are a few APIs to consume video sources: Video for Windows, DirectShow, Media Foundation (in chronological order).

Video for Windows is not really extensible and limited in capabilities overall. It will see a virtual device if you provide a kernel mode driver for a virtual camera.

DirectShow is the API used by most video capture enabled Windows applications and it is present in all Windows versions including Windows 10 (except just Windows RT). Then it’s perfectly extensible and in most cases the term “virtual webcam” refers to DirectShow virtual webcam. Methods to create DirectShow virtual webcam discussed in many StackOverflow questions remain perfectly valid for Windows 10, for applications that implement video capture using DirectShow:

DirectShow samples were removed from Windows SDK but you can still find them in older releases:

Getting DirectShow Samples on Windows 8

If you provide a kernel mode driver for video camera device (your virtual webcam through custom kernel driver), DirectShow would also see it just like other video APIs.

Media Foundation is a supposed successor of DirectShow but its video capture capabilities in the part of extensibility simply do not exist. Microsoft decided to not allow custom video sources application would be able to discover the same way as web cameras. Due to Media Foundation complexity, and overhead and overall unfriendliness it is used by modest amount of applications. To implement a virtual webcam for Media Foundation application you again, like in case of Video for Windows, have to implement a kernel mode driver.

Microsoft’s James Daily wrote back in 2011 (and it’s an incredible response in the public forum from MS guy – provided that DirectShow branch of the same forum did not see anything close for 10+ years) about how technologies relate one to another:

Hey, I’m the team’s DShow expert. Trevor asked me to take a look at your post and give my two cents. From looking at the DShow code that you are using in your winforms application I just want you to be aware that by including quartz.dll as a dependency you are using the DirectShow 8 OLE automation objects. These objects have been deprecated for years and are certainly not recommended [this might perhaps be not accurate enough because generally stuff in quartz.dll is not deprecated, it’s rather orphaned and yet waits it deprecation like related stuff from qedit.dll; however the overall attitude is about right – RR]. At this time Microsoft does not have a supported solution for calling DirectShow code from C# (or any managed language). Please see the “note” at the top of the page at the link below for documented confirmation of this. Because the technology is not supported from the winforms environment it is not possible for us to suggest a supported workaround from managed code.

That said it should be possible to facilitate the functionality that you are looking for by creating a custom EVR presenter. By using a custom presenter you can get direct access to the D3D surface. You can then use the standard D3D constructs to draw directly to the same D3D surface that the EVR is using to blit the video. There are two things to keep in mind about this solution. First you must code this solution in unmanaged C++. Again this is due to the fact that DirectShow is not supported from managed code. Second, this solution is extremely complex and difficult to implement even for the most experienced DirectShow / D3D expert. Because of these two factors it is recommended that you take a serious look at the MediaElement in WPF.

As you know the WPF environment is constructed from the ground up to offer developers a very rich “graphics first” environment. The MediaElement in particular was designed to allow you to mix video with various other UI components seamlessly. This solution will give you the flicker free, “draw over video” solution that you are looking for. The best part is you can do all of this in C#. The bad part of this solution is that the MediaElement is not designed for displaying time sensitive media content. In other words, the MediaElement is notorious for dropping and delaying the display of video frames. There are ways to minimize this such as using SD rather than HD content, use a video accelerated codec, etc. However, you will never get the same high quality video experience that you find with DirectShow.

I hope this will help you understand the current shortcomings of the technologies that you have chosen and help you to focus your efforts on a fully supported and viable solution. If you need any additional clarification please let us know.

and then also:

Unfortunately you can’t really tell the WPF MediaElement to never drop frames. The term we use for this class of issues is “disparate clocks”. In this case WPF is updating the screen at a certain rate (clock 1). The MediaElement (based on WMP) is cranking out video frames at a slightly different rate (clock 2). Given the underlying technologies there is currently no way to synchronize the two clocks and force them to “tick” at the same rate. Since the display will only be updated according to the WPF clock, multiple frames of video may be sent from the MediaElement to WPF between clock ticks. Because of this the MediaElement may appear to drop frames. This is a very common problem in multimedia development and there is no simple solution.

So if you absolutely need frame accuracy in your application then using the MediaElement probably won’t work for you. That said, there are some things that you can do to improve the chances of your content dropping as few frames as possible. Modify your content so that it uses either the h.264 or VC1 codec. Require your users to have modern video HW capable of advanced video acceleration. Use the MPEG 4 or ASF file container. When encoding your content set your frame rate at or below 25 frames per second. Set the resolution of your content to something like 720×480. Set the bitrate to VBR constrained and set an upper limit of between 500 Kbps and 2.5 Mbps.

If you use the guidelines above you will minimize the number of frames that get dropped but you will never be able to completely eliminate them. Also keep in mind that the same frames may not be dropped. For example: if you play video1.asf the first time you might drop frames 200 and 375. On the next run of the same file you may drop frames 143, 678 and 901. This is due to the relatively nondeterministic nature of the Windows OS.

I hope this helps.

Another commenter responded rather angrily:

…fail to include any mention of the DirectShow.NET library. Why? And shame on them for failing to do so. This library helps you use DirectShow in a managed context. There are plenty of code samples to be found….

The answer to this, however, was given in the same thread above a couple of times and explains that the responses are limited by existing policy:

I cannot comment on 3rd party libraries.

Because the technology is not supported from the winforms environment it is not possible for us to suggest a supported workaround…

A StackOverflow question (already deleted) asked about use of indices when referencing Video for Windows (VFW) capture devices such as in capGetDriverDescription API and other. Video capture with Video for Windows allowed use of up to 10 devices (did anyone have that many at that time?). The numbering was API specific and at the latest stage the documentation described it as:

Plug-and-Play capture drivers are enumerated first, followed by capture drivers listed in the registry, which are then followed by capture drivers listed in SYSTEM.INI.

Even though it is a legacy API, and the API was really really simple and limited in capabilities, it still exists in all Windows versions, there is still some code running, and due to complexity of modern APIs some people still use it in VB.NET and C# projects.

There is, however, a trap involved if someone attempts to use multiple cameras using VFW. VFW drivers are no longer developed since… Let us see what VirtualDub says about dates and how ancient they are:

The newer type of video capture driver in Windows uses the Windows Driver Model (WDM), which was introduced in Windows 98 and 2000. The Microsoft DirectShow API is the primary API to use these drivers. Because the DirectShow API supports a larger variety of commands and settings than VFW, the functionality set of a WDM driver is significantly improved.

DirectShow is a much more complex API than VFW, however, and WDM-model drivers historically have been a lot less stable than their VFW counterparts. It is not unusual to see problems such as capture applications that cannot be closed, because their program execution is stuck in the capture driver. WDM is the proscribed driver model going forward, however, so the situation should improve over time.

All new drivers were WDM drivers for 15+ years. In order to provide backward compatibility later between VFW and WDM, Microsoft came out with Microsoft WDM Image Capture (Win32) driver. Windows versions up to Windows 10 include it, and it is the only VFW driver in the system. In turn, it manages one of existing WDM-driver devices of choice and exposes video capture functionality to VFW applications. If there are two or more WDM drivers, the VFW driver offers to choose between the devices.

The screenshot displays a long standing bug with this driver: it offers choices of all registered DirectShow video capture devices (it enumerates CLSID_VideoInputDeviceCategory category) and reality is that it can only work with WDM devices and not other (more on this below).

VirtualDub has a mention of this driver as well:

If you have a Windows Driver Model (WDM) driver installed, you may also have an entry in the device list called Microsoft WDM Image Capture (Win32) (VFW). This entry comes from a Microsoft driver called VFWWDM32 and is a wrapper that allows WDM-model drivers to be used through the older Video for Windows (VFW) API. The WDM driver that is adapted can be selected through the Video Source driver dialog.

There are unfortunately some quirks in the way this adapter works, and some video capture devices will work erratically or not at all through this wrapper. Device settings not accessible through VFW will also still not be available when using it. If possible, use the capture device directly in DirectShow mode rather than using the VFWWDM32 driver.

This is works pretty nice with VFW API and applications. Even though the are all ancient and deprecated years ago, the system still has bridge to newer devices and applications can leverage their functionality. The problem is that there is only one VFW driver, and its index is zero. If you need two cameras you’re busted.

VFWWDM32 itself does not use any system exclusive resources and there is no reason why its different instances could not be configured with different WDM devices. However, VFWWDM32 is a simple old wrapper, either thread unsafe or such as implemented as singleton. People complain the operation with two cameras is impossible or unstable. It is still possible to run two different processes (such as, for example, VirtualDub) with two completely different VFWWDM32’s which do not interfere because of process boundary and run fine. WDM device is selected using capDlgVideoSource interactively, developers had hard time to do selection programmatically.

The interesting part is how VFWWDM32 does video capture using WDM. It is a cut corner in development: instead of doing simple DriectShow graph with Source –> Renderer, or Source –> Sample Grabber -> Renderer topology, where the wrapper would easily support all DriectShow video devices, including virtual, they decided to implement it this way:

One-filter graph, where the filter is the WDM Video Capture Filter for the device in question.

the graph is CLSID_FilterGraphPrivateThread type, *FINALLY* it is found what this undocumented flavor of DirectShow filter graph is used for
source filter output pins are not terminated, not connected to anything else
the graph is never run, produces VFW output in stopped state

Instead, VFWWDM32 uses some private undocumented communication to the WDM filter internals to run the device and receive frames.

Bottom line: VFW is a backward compatibility layer now on top DirectShow. DirectShow and Media Foundation both use WDM drivers to access video capture devices. Artificial constrain caused by simplistic implementation of VFWWDM driver is a limit of one video camera per process at a time.

There was a problem reported for registered and relocated DirectShowSpy, which might be causing issues: Deleting faulty DirectShowSpy registry key.

Some users that use a 3rd party tool called DirectShowSpy may encounter errors when logging in to XSplit.

This can be caused by a fault registry key that is introduced when DirectShowSpy is registered to intercept Filter Graph initialization — Filter Graph is used by XSplit. The faulty DirectShowSpy registry key is usually caused by DirectShowSpy program begin relocated after registration.

To workaround this situation, XSplit1 detects the presence of HKEYCLASSESROOT\CLSID{E436EBB3-524F-11CE-9F53-0020AF0BA770}\TreatAs registry key2 when it fails to initialize Filter Graph and exits when it is found. In this case, user must manually correct the DirectShowSpy registration or delete3 the registry key. Only after either is done can XSplit be restarted.

The description of the problem is good, solution is good but incomplete.

DirectShowSpy intercepts a few COM classes, not just one, and removing single registry value is only a partial fix.

DirectShowSpy.dll exports UnregisterTreatAsClasses function to accurately restore operation of system classes. It does registry permission magic and updates all COM classes involved. Default unregistration (DllUnregisterServer, regsvr32 /u) behavior is to restore original classes only in case they are currently overridden by DirectShowSpy. That is, if the DLL is moved (deleted) the broken registrations are retained in the registry during unregistration process.

UnregisterTreatAsClasses resolved this problem by forcing recovery of original classes no matter who is overriding them at the moment.

C:\>rundll32 DirectShowSpy-Win32.dll,UnregisterTreatAsClasses
C:\>rundll32 DirectShowSpy-x64.dll,UnregisterTreatAsClasses

There is an intersting submission for video capture device capabilities for “The Short-Range Intel® RealSense™ Camera F200” camera. Another blog user earlier mentioned they have a good stock of the devices with plans to take advantage of new technology.

It sounds like the new cameras offer new opportunities for application in user interaction, in ability to conveniently enhance user experience with things like gestures etc.

This is what the camera looks like on the software side:

Intel(R) RealSense(TM) 3D Camera Virtual Driver
Intel(R) RealSense(TM) 3D Camera (Front F200) RGB
Intel(R) RealSense(TM) 3D Camera (Front F200) Depth

Presumably, there are synchronized video and depth sources. It might so happen that SDK offers other presentations of the data (snapshots for combined data and combined stream?).

So what it is all about in terms of how it looks for a video capture application and APIs? The video sensor offers standard video caps and YUV 4:2:2 video stream at 60 fps at resolutions up to 960×540, higher resolutions up to 1920×1080 at 30 fps. This exceeds USB 2.0 bandwidth, so this is either USB 3.0 device or there is hardware compression, with internal software decompression. The video device does not offers compressed video feed capabilities.

There is another video source named “Depth”. It offers YUY2 feed as well as other options with fancy FourCCs (ILER, IRNI, IVNI, IZNI, RVNI, ZVNI?) which is presumably delivering depth information at 640×480@60. Respective SDK is supposedly have the formats documented.

At 60 frames per second and supposedly low latency, the data should be a source of good real-time information to track gestures and reconstruction of short range 3D scene in front of the camera.

Original DirectShow and Media Foundation capability files:

Additional in-depth information about the technology:

A Comparison of Intel® RealSense™ Front-Facing Camera SR300 and F200

An update for Reference signal source for DirectShow DLLs:

the source is doing more accurately RGB subtypes and allows specification whether you want MEDIASUBSTYPE_RGB32 or MEDIASUBSTYPE_ARGB32
additionally the DLL implements Microsoft Media Foundation Media Source for the video stream

A more detailed description follows.

RGB32 and ARGB32 are very close and share the same byte structure, and due to minimal support of alpha channel with video, these are having the difference mostly in counterpart support in other applications, like for example and specifically hardware-assisted H.264 encoders whcih are taking alpha-enabled variant.

IVideoSourceFilter::SetMediaType method takes vCompression argument which defines the subtype. RegisterSources sample code shows how the method is used when exposing reference signal as video capture device.

Similar IVideoMediaSource::SetMediaType methods is applicable to Media Foundation counterpart (see below).

Both implementation only offer the given subtype as default, but in the same time both accept the other variant as well if an application or peer connection is trying to re-agree the media type. Same applies to changing resolution etc. The sources are flexible to take different video format if anyone is requesting it.

The other big new thing is Media Foundation API Media Source which generates reference signal as well. There is no option to set it up as a virtual camera because the API does not offer extensibility of the kind, however the source can be used to generate test content via Media Foundation and the code remains pretty simple. I am publishing MfGenerate code snippet which demonstrates the necessary steps to create an MP4 file with video, with desired properties.

As Media Foundation offers H.265 (HEVC) and fragmented MP4 options, they can also be easily used with the source to generate test footage.

The code does the following steps:

Creates a media source (commented out lines show alternate steps to create a media source from a file)
Creates a source reader from media source
Builds an H.264 media type from raw video media type
Creates and configures a sink writer, which is instructed to do its magic setting up H.264 encoder (a side note – the code produces 4096×2304 video, however it is only possible once hardware encoder is enabled; software encoder was rejecting the media type)
Implements a loop of reading frames until they run out feeding them into encoder/writer

High level APIs are simple (similar to DirectShow), which is untrue for the internals (similar to DirectShow; even more so).

Media Foundation source is video only for now.

MF media source is supposed to be seekable (not really tested; not really testable with topoedit), and allows zero duration to produce infinite feed. Duration is not necessarily taken from property, it can also be specified with overwritten presentation descriptor attribute. The video format can also be set up through stream descriptor media type handler.

Download links

Binaries:
- 32-bit: DirectShowReferenceSource-Win32.dll
- 64-bit: DirectShowReferenceSource-x64.dll
- Sample code: SVN, Trac
- License: This software is free to use

Update – Connecting MF Media Source to MFCaptureD3D Sample application

To quickly connect MF media source to Windows SDK MFCaptureD3D Sample application, add #import and a few code lines replacing the source around CPreview::SetDevice as shown on the image below:

There is so little information about this problem (not really a bug, rather a miscalculation) out there because it is coming up with customized Video Mixing Renderer Filter 7 and there is no problem with straightforward use.

In windowless mode the renderer is accepting media samples and displays them as configured. IVMRWindowlessControl::GetCurrentImage method is available to grab currently presented image and obtain a copy of what is displayed at the moment – the snapshot. The renderer is doing a favor and converts it to RGB, and the interface method is widely misused as a way to access uncompressed video frame, esp. in format compatible with other APIs or saving to bitmap (a related earlier post: How To: Save image to BMP file from IBasicVideo or VMR windowless interface).

One of the problems with the method is that it reads back from video memory, which is – in some configurations – an extremely expensive operation and is simply unacceptable because of its impact overall.

This time, however, I am posting another issue. By default VMR-7 is offering a memory allocator of one media sample. It accepts a new frame and then blits it into video device. Simple. With higher resolutions, higher frame rates and in the same time having VMR-7 as a legacy API working through compatibility layers, we are getting into situation that this presentation method becomes a bottleneck. We cannot pre-load next video frame before getting back from presentation call. For 60 frames/second video this means that with any congestion 17 millisecond long we might miss a chance to present next video frame of a video stream. Virtual artifact and these things are perceptible.

An efficient solution to address this problem is to increase number of buffers in video renderer’s memory allocator, and then fill buffers asynchronously. This does work well: we fill the buffers well in advance, the costly operation does not have to complete within frame presentation time frame. Pushing media pipeline pre-loads video buffers in efficient way and then video renderer simply grabs out of the queue a prepared frame and presents it. Terrific!

The video renderer’s input is thus a queue of media samples. It keeps popping and presenting them matching their time stamps against presentation clock waiting respective time. Now let us have a look at snapshot method signature:

HRESULT GetCurrentImage(
  [out] BYTE **lpDib
);

We have an image, that’s good and now the problem is that it is not clear which sample from the queue this image corresponds to. VMR-7 does not report associated time stamp even though it has this information. The problem is that it could have accepted a frame already and returned control, but presentation is only scheduled and the caller cannot derive the time stamp even from the fact that renderer filter completed the delivery call.

Video Mixing Renderer 9 is presumably subject to the same problem.

In constrast, EVR method’s IMFVideoDisplayControl::GetCurrentImage call is already:

HRESULT GetCurrentImage(
  [in, out] BITMAPINFOHEADER *pBih,
  [out]     BYTE             **pDib,
  [out]     DWORD            *pcbDib,
  [in, out] LONGLONG         *pTimeStamp
);

That is, at some point someone asked the right question: “So we have the image, where is time stamp?”.

Presumably, VMR-7 custom allocator/presenter can work this problem around as presenter processes the time stamp information and can reports what standard VMR-7 does not.

Many questions in DirectShow development are caused by lack of developer’s understanding what topology his code effectively built. Intelligent Connect and RenderXxx methods help adding and connecting filters and in the end a developer does not have a faintest idea what the pipeline looks like.

DirectShow API provides methods to enumerate filters, pins, connection and obtained detailed information about the filter graph. The API is well-documented. Then Windows SDK is shipped with GraphEdit which helps building graphs interactively. Ability to publish a graph on ROT and review it from GraphEdit is nothing but powerful. And then we have GraphStudioNext which makes everything even more convenient.

This does not seem sufficient and clear as many new questions and misunderstanding show that developers have false assumptions on graphs their applications use.

DirectShowSpy goes one step further with debugging options. With DirectShowSpy one can embed reviewing UI right into the developed application and either generate detailed textual description of filters, connections, media types as well as pass filter graph to GraphEdit/GraphStudioNext for interactive review with visualized topology. No excuses left any longer for misunderstanding built topologies.

Steps below explain in detail how to visualize your application DirectShow filter graph and generate a textual report on graph details.

1. For starters, one needs to intall DirectShowSpy in target system. Standard installation is mentioned in original post.

It is necessary that DirectShowSpy of correct/matching bitness is installed. 32-bit applications use 32-bit DirectShowSpy and 64-bit applications – 64-bit DirectShowSpy. .NET applications built as “Any CPU” are effectively either 32 or 64 bit processes and respectively need a matching spy as well.
To cut long story short, simply download DirectShow*.* from Toolbox and use DirectShowSpy-Win32-reg-ui.bat or DirectShowSpy-x64-reg-ui.bat to pop up registration UI. You need local administrator privileges for the registration step (or spy is usable through COM otherwise but it’s beyond scope of this post).

2. DirectShowSpy’s FilterGraphHelper object (already mentioned earlier) offers DoPropertyFrameModal method to pop up diagnostic UI. The helper needs prior initialization with either graph, filter or pin interface. C++ code snippet:

#import "libid:B9EC374B-834B-4DA9-BFB5-C1872CE736FF" raw_interfaces_only // AlaxInfoDirectShowSpy
// ...
CComPtr<IFilterGraph2> pFilterGraph;
// ...
CComPtr<AlaxInfoDirectShowSpy::IFilterGraphHelper> pFilterGraphHelper;
ATLENSURE_SUCCEEDED(pFilterGraphHelper.CoCreateInstance(__uuidof(AlaxInfoDirectShowSpy::FilterGraphHelper)));
ATLENSURE_SUCCEEDED(pFilterGraphHelper->put_FilterGraph(pFilterGraph));
ATLENSURE_SUCCEEDED(pFilterGraphHelper->DoPropertyFrameModal(NULL));

C# code snippet:

IFilterGraph2 graph = new FilterGraph() as IFilterGraph2;
// ...
FilterGraphHelper helper = new FilterGraphHelper();
helper.FilterGraph = graph;
helper.DoPropertyFrameModal(0);

Downloadable sample projects (FilterGraphHelperDialog for C# and FilterGraphHelperDialog2 for C++) are available in Subversion repository or Trac.

3. DoPropertyFrameModal methods opens a window (it’s argument is parent window handle, optional) with details about the graph, including copyable diagnostic text, filters and their property pages all gathered in single window.

NOTE: With root tree element “Filters” selected, the right-side pane contains the text that provides filter graph description (see image above)!

4. Additionally, it is possible to launch GraphEdit/GraphStudioNext with a hotkey and open – through ROT – the graph visually.

This requires that Windows SDK proppage.dll is available. It is normally registered with Windows SDK, and otherwise can be copied from SDK into target system and COM-registered using regsvr32. Or copied into the folder of DirectShowSpy in which case DirectShowSpy-Win32-reg-ui.bat (see item 1 above) file will see it and offer additional property page for registration.

5. When no longer needed, DirectShowSpy can be removed from system using the batch file mentioned above in item 1.

Whatever debugging you do with DirectShow filter graph, you need a complete understanding what filter graph you deal with. If you want to provide additional information to certain DirectShow related question, a copy/pasted diagnostic information needs to be attached to such question so that others understand what you are dealing with exactly.

There is a comment from MSFT’s Mike M on MSDN Forums on recent issue with compressed video capture. I am pulling it out completely as a quote below:

I’d like to start off by providing you guys a little more context on the behavior you’re encountering.

One of the main reasons that Windows is decoding MJPEG for your applications is because of performance. With the Anniversary Update to Windows 10, it is now possible for multiple applications to access the camera in ways that weren’t possible before. It was important for us to enable concurrent camera access, so Windows Hello, Microsoft Hololens and other products and features could reliably assume that the camera would be available at any given time, regardless of what other applications may be accessing it. One of the reasons this led to the MJPEG decoding is because we wanted to prevent multiple applications from decoding the same stream at the same time, which would be a duplicated effort and thus an unnecessary performance hit. This can be even more noticeable or perhaps trigger error cases on in-market devices with a hardware decoder which may be limited on how many decodes can take place simultaneously. We wanted to prevent applications from unknowingly degrading the user experience due to a platform change.

The reasoning for H.264 being decoded can get a little more complicated (and I’m just learning the details myself as I talk to other members of the team), but the basics revolve around how H.264 allows for encoding parameters to be changed on the camera directly, and how in a situation where multiple applications are making use of this control path, they could interfere with each other. Regarding Roman’s concerns about Lync: both Lync and Skype are partner teams, and we stay in touch throughout the development process, so the camera functionality in those applications will continue to work.

So yes, MJPEG and H.264 being decoded / filtered out is the result of a set of features we needed to implement, and this behavior was planned, designed, tested, and flighted out to our partners and Windows Insiders around the end of January of this year. We worked with partners to make sure their applications continued to function throughout this change, but we have done a poor job communicating this change out to you guys. We dropped the ball on that front, so I’d like to offer my apologies to you all. We’re working on getting better documentation out, to help answer any questions you may have. Of course, you can always reach out to us via these forums for specific issues, as we monitor them regularly, or file feedback using the Feedback Hub. We’re constantly collecting feedback on this and other issues, so we can better understand the impact on our application developers and customers. If you’re having issues adapting your application code to the NV12 / YUY2 media types, we’d like to support you through the changes you may need to make. To get you started, please refer to the documentation links in my previous post. If there are reasons why working with this format isn’t feasible for your project, please let me know, and I’ll present them to the rest of the team, to try and find the best solution for your case.

Dacuda and Stephan B, I’m curious about your specific situations, since you report that this change is breaking functionality for your customers. Are your customers using custom camera hardware? Is the set of supported cameras restricted by your applications? How do your applications deal with devices like the Surface Pro 4, Surface Book, or Dell Venue Pro, which wouldn’t offer the media types your applications are relying on?
I’d like to wrap up this wall of text by letting you know that your feedback here and through other channels is greatly appreciated and something that’s on our radar. We’re trying to look into what other options we can offer you to be able to improve on this for your (and our) customers, so stay tuned! I invite you to please subscribe to this thread (use the “Alert me” link at the top), and I’ll keep you guys updated on what we find. Thanks!

Basically, it’s bad news for those who consume compressed video from capture devices – the breaking change is intentional. Something is offered in exchange and I hope someone will present the platform changes in a friendly readable document. In particular, Microsoft seems to be adding VP8/9 video decoder and encoder in this new platform version (more later on that perhaps).

The problem with video capture issues looks more or less clear.

As explained by Mike M here, the breaking changes in Windows 10 Anniversary Update are caused by intentional redesign of the platform that enable shared access to video capture devices.

… it is now possible for multiple applications to access the camera in ways that weren’t possible before. It was important for us to enable concurrent camera access, so Windows Hello, Microsoft Hololens and other products and features could reliably assume that the camera would be available at any given time, regardless of what other applications may be accessing it.

Originally video capture application were highly performance sensitive due to insufficient horsepower of computers overall, and sharing of video capture sessions between the applications was not on agenda. Then Microsoft hibernated for over a decade and did not do updates to the platform to follow software and hardware trends. Those needing camera in 2+ applications had to use third party camera splitting software. Time has come to include video sharing to the platform and… that washed away support for compressed video formats. If camera is shared, who is going to decode video into presentable format? Guys at Microsoft decided that they will, that is it’s now “decode then share between applications” scenario.

When an application runs video capture session, Windows 10 Anniversary Update now runs the actual session in a service process. A new Windows Camera Frame Server service is responsible to acquire video, decode and distribute it.

Applications access FrameServer service with the help of FSClient.dll connecting to shared service which runs actual session.

I am not sure how sharing works exactly, but I was unable to start two TopoEdit instances doing video capture from the same camera. Presumably, default behavior is still imitating exclusive use of hardware and possibly priority clients (like mentioned Windows Hello) have new ways to take over video capture device on demand, or we will see new functionality with respective SDK/documentation update.

Applications now – as it is assumed from the description – get the only option to communicate to FrameServer service and not the video capture source directly. On that way, formats like MJPG and H264 are lost.

As recent comments indicate this to be a well planned and scheduled scenario, it looks unlikely that things are going to change. It was decided that there is no exclusive mode video capture, just shared. Developers are to wait for possibly changed attitude and something similar to WASAPI exclusive low-latency mode for those specific application which need it.

So yes, MJPEG and H.264 being decoded / filtered out is the result of a set of features we needed to implement, and this behavior was planned, designed, tested, and flighted out to our partners and Windows Insiders around the end of January of this year. We worked with partners to make sure their applications continued to function throughout this change, but we have done a poor job communicating this change out to you guys. We dropped the ball on that front, so I’d like to offer my apologies to you all.

A small relief is that they restructured the platform and not dropped the support for MJPG and H264 in first place. Okay, there is no formal access to compressed streams using standard API but a stab into doing it undocumented way shows that all the gear remains in place.

A small proof-of-concept DirectShow video source filter that talks to Logitech C930e camera bypassing newly introduced stuff is confirming that streams like 1920×1080@30 MJPG are still supported by the camera and are operational. That is, it is still possible to stream MJPG and H264 from USB web cameras, specifically at modes exceeding standard USB 2.0 bandwidth limit for raw video, and eliminating software compression:

This of course takes again exclusively control over the camera and prevents from sharing video feed as the update intended. However, the video itself is where is was.

There is no public source and/or details on this filter because it’s sensitive to undocumented behavior of Media Foundation platform. Just as a demo, the DLLs are there: Win32, x64. (limited to Logitech Webcam C930e’s highest MJPG mode but basically the method could work for any MJPG camera, and C930e’s H264 too).

That is, if your application is broken by Windows 10 Anniversary Update because you simply assumed availability of specific modes, then there is a chance that update of the application to make it compatible to new platform design with FrameServer service could fix it. If you intentionally consumed compressed video for quality, rate and performance reasons then you’re in trouble and there is no really a solution from Microsoft is expected soon. Perhaps the best would be to not upgrade to Anniversary Update.

Reference signal source for DirectShow

Download links

Logitech C930e camera and Media Foundation

Common

`MFVideoFormat_H264`, `MFVideoFormat_H264_ES`

`MFVideoFormat_MJPG`

`MFVideoFormat_YUY2`

Blackmagic Design’s “Decklink Video Capture” filters

Windows 10 AVI Splitter bug

Not so good H.264 media type

CLSID_VideoInputDeviceCategory and Media Foundation

Encoding H.264 video using hardware MFTs

Intel® Quick Sync Video H.264 Encoder MFT

NVIDIA H.264 Encoder MFT

Follow up: mixed parallel H.264 encoding, Intel® Quick Sync Video H.264 Encoder MFT + NVIDIA H.264 Encoder MFT

DirectShowFileMediaSamples Update: Command Line Mode

Download links

Calling convention violator broke streaming loop pretty far away

How to create Virtual Webcam in Windows 10?

“… you will never get the same high quality video experience that you find with DirectShow”

Video for Windows API and Multiple Cameras

DirectShowSpy: Restore default system behavior

Intel® RealSense™ Camera in DirectShow/Media Foundation

Reference Signal Source: RGB32/ARGB32 Subtypes, Media Foundation Media Source for Video

Download links

Update – Connecting MF Media Source to MFCaptureD3D Sample application

Little known DirectShow VMR-7 snapshot problem

Understanding Your DirectShow Filter Graph

Comment on Video Capture Issues with Windows 10 Anniversary Update

Video Capture in Windows 10 Anniversary Update Again: MJPG is still here but hidden by new Frame Server thing

Download links

Common

MFVideoFormat_H264, MFVideoFormat_H264_ES

MFVideoFormat_MJPG

MFVideoFormat_YUY2

Intel® Quick Sync Video H.264 Encoder MFT

NVIDIA H.264 Encoder MFT

Download links

Download links

Update – Connecting MF Media Source to MFCaptureD3D Sample application

`MFVideoFormat_H264`, `MFVideoFormat_H264_ES`

`MFVideoFormat_MJPG`

`MFVideoFormat_YUY2`