Building a rich and extensible media platform
Engaging with rich media—whether watching a movie, video chatting, or playing music—is one of the most prevalent and enjoyable things we do on our PCs today. I’d like to talk a little bit about the work we’ve done in Windows 8 to make a rich variety of multimedia activities possible, and to extend those capabilities to third party developers through an extensible media platform.
We had three goals in mind when designing the Windows 8 media platform:
- Maximize performance. We wanted media playback to be fast and responsive, enabling the full power of the hardware while maximizing battery life on each PC.
- Simplify development and extensibility. We wanted to provide a platform that could be easily extended and tailored for a given application, setting the stage for innovative custom media apps on Windows.
- Enable a breadth of scenarios. A high performance, high efficiency, extensible platform can then enable a wide range of music, video, communications, and other multimedia apps.
With these three goals in mind, we set out to reimagine the media experience on the Windows platform.
Faster, more responsive media experiences
Performance is a key aspect of any user experience, but it is especially critical in multimedia scenarios. Videos need to play in real time, voice communication needs to feel instantaneous, and all of these tasks need to minimize the drain on your battery.
We measure performance by the time, computing resources, and memory that a given task takes on a system. We aimed to minimize all of those metrics. Our goals for media performance were focused on audio and video playback, transcoding, encoding, and capture.
Efficient video decoding
To get better battery life or just reduce power consumption for all media scenarios, we continue to work with partners in the silicon chip industry to enable new and faster experiences. With Windows 8 running on a Windows 8 certified PC, video decoding for common media formats will be offloaded to a dedicated hardware subsystem for media. This allows us to significantly lower CPU usage, resulting in smoother video playback and a longer battery life, as the dedicated media hardware is much more efficient than the CPU at media decoding. This improves all scenarios that require video decoding, including playback, transcoding, encoding, and capture scenarios.
The figure below shows a comparison of the average CPU utilization between Windows 7 and Windows 8 during playback of 720p VC1/H.264 video clips and webcam capture preview.
In addition to video offload, the improvements to webcam capture are made possible by the move from a DirectShow Capture API to the new, far more optimized Windows 8 Media Foundation Capture API. We’ve also improved software encoders for H.264 and VC-1 content so that encoding using the CPU (when it makes sense) is both fast and power-efficient.
Maximizing battery life during audio playback
Another example of the media performance improvements we’ve made in Windows 8 is in maximizing battery life (or just reducing power consumption) during audio playback. In addition to enabling offload of the audio pipeline (similar to the offload of video described above), we’ve radically improved the audio playback pipeline to be more efficient during steady-state playback. By batching up large chunks of audio data and doing all the processing for that chunk at one time, the CPU can stay asleep for over 100 times longer (over 1 second vs. 10ms), which can result in dramatically increased battery life during audio playback.
Of course, this approach isn’t perfect for all scenarios since the increased buffering introduces additional delay. In the communications section below, we’ll talk more about these tradeoffs and how the media stack adapts to optimize for each scenario
Audio and video offloading are just a couple of examples of the ways we’ve optimized the media stack in Windows 8 to provide lower CPU utilization, lower memory utilization, and better battery life for Desktop and Metro style apps.
Supporting a rich set of media scenarios
Performance is a critical aspect of the platform, but it is only as important as the features that shine because of it. In Windows 8, those features include support for modern video formats, low-latency communication streams, and a seamless connection to external media devices.
One of the challenges in developing a single media platform that serves different scenarios is that the platform has competing goals. For example, communication scenarios require low-latency, and audio/video encoding and playback, whose quality and performance benefit from buffering, which results in higher latency. In the next several sections, we’ll touch on these challenges in the context of some of the scenarios we’ve worked to enable in Windows 8, including:
- Communications (e.g. Skype, Lync, etc.)
- Video playback and modern format support
- Auto-orientation of video
- Playback of premium content
- Seamless audio transitions
- Bringing the media experience to additional screens
- Emerging media capabilities
Simplifying development and extensibility
One common theme across these experiences is the extensibility that we’ve incorporated into the multimedia platform. Because users have a wide range of use cases, media formats, codecs, protection mechanisms, and processing, we provided our developers with the ability to customize and tailor their offerings to create great apps and websites on Windows.
As we discuss some of the media scenarios in the next several sections, we’ll also cover some of the work we’ve done to make those scenarios extensible by developers and third-party partners. Let’s dive deeper into the scenarios we’ve targeted for Windows 8.
Real-time communication on PCs, especially on mobile devices, has seen a huge growth over the last decade. Windows users are using services like Skype and Lync to make several billion minutes of voice and video calls per day. TeleGeography estimates that international Skype-to-Skype calls (including video calls) grew 48 percent in 2011, to 145 billion minutes. We’ve made a significant investment in improving the experience of video and audio calling on all Windows 8 PCs. To achieve this goal, we focused our efforts in two areas:
- Enable built-in low-latency media capture and rendering. Low latency is essential for communications apps, so Windows supports low-latency media capture and playback into the OS.
- Support HD cameras to enhance video communication experience. High-definition videos make your communication experience more real and enjoyable, so Windows supports HD camera devices.
Enabling low latency
When you communicate with another person, you expect near-instant responses. For this reason, communications systems generally try to minimize the end-to-end delay (also referred to as latency). In designing audio and video systems for playback, buffering is often used as both a protection against glitches caused by processing spikes or network traffic, and to reduce power consumption. However, this buffering introduces a delay into the audio and video, which is perceived as latency by the audience. In engineering Windows 8, we designed the media platform to support both playback-optimized and communication-optimized scenarios. The media infrastructure can switch between a playback mode (high buffering, more tolerant of varying conditions) and a communications-optimized mode (low delay).
According to the TIA/EIA 920 standard, the one-way audio latency that can be attributed to just the media processing pipeline cannot exceed 100ms in order to achieve a usable real-time communication experience. With this metric in mind, we designed a test environment to measure the end-to-end latency of the pipeline, shown in the following diagram:
In the case of video communication, the end-to-end or “glass-to-glass” pipeline latency is measured as the delay it takes for a video frame to be captured by the camera device and then encoded to a supported video format, streamed over the network loopback interfaces, decoded, and finally rendered by the display.
Looking at the figure below, you can see the result obtained for capturing and rendering PCM audio when the media pipeline is in low latency mode. The first set of spikes corresponds to the original spoken words at the transmitter and the second set shows those words at the receiver. The delay between the two is 65ms, well below the 100ms goal.
The next chart shows a comparison of the pipeline latency of playback and communication-optimized mode when a video frame is captured, encoded (in H.264 format), streamed, decoded, and then displayed at various resolutions. The goal of 145ms overall latency (as deemed by TIA/EIA 920 for usable real-time video calling) is shown by the green line on the chart.
In playback mode, the average latency of the pipeline is about 575ms. This delay is necessary for a smooth playback experience when consuming video, but unacceptable for real-time video communication. In low latency mode, on the other hand, the measured latency is well under the target goal at each of the measured video resolutions.
Supporting HD video calling
Another example of the work we have done to improve communication on Windows 8 PCs is through OS support for HD cameras. New class drivers will work transparently with applications to provide support for HD video features. In addition, all of the hardware acceleration for video decoding discussed previously will be utilized for communication scenarios.
Windows 8 will offer a consistent, high-quality, hardware-accelerated, power efficient media communication experience on PCs designed for Windows 8. We have made significant investments in the media platform to improve pipeline latency, and with added support for H.264 cameras, users will be able to communicate with friends and family in high-fidelity HD video.
Video and audio support for Metro style apps
Our main goal for native media format support for Metro style apps was to ensure users and app developers could count on a consistently great playback experience across a wide variety of PC form factors, with modern formats used in mainstream scenarios such as:
- HTML5-based entertainment on the web
- Home movies captured using popular smartphones, point-and-shoot cameras, or AVC-HD cameras
- Streaming music, movies, and TV shows from popular services
The tables below show the video and audio formats that have built-in support for Metro style apps. Formats recommended for use by Metro style apps are a reflection of deep partnerships with hardware manufacturers for predictable hardware acceleration across PC form factors and predictable end-to-end scenario performance beyond playback such as capture, streaming, and transcoding.
Windows 8 has excellent support for MPEG-4, most typically comprised of H.264 video and AAC audio. Several popular codecs, including Divx and Xvid, implement the MPEG-4 Part 2 standard, so many of these files play great in Metro style apps. The same is true for modern MOV files, which are based on the MPEG-4 Part 12 standard, such as videos captured on iOS devices. Fragmented MPEG-4 and 2K/4K resolutions are now possible. We have previously talked about MPEG-2 and DVD playback, which is available in Windows 8 Media Center.
During the development of Windows 7 we talked quite a bit about CODEC support natively in Windows and the formats available through extensibility. Since then, the environment around CODECs has consistently moved towards a smaller set of well-defined and broadly-supported formats, particularly h.264 for video. Due to factors such as intellectual property and hardware support, this makes a great deal of sense. Even browsers are making this transition with HTML5. But we also recognize that some individuals have preferred formats for a variety of reasons, and we wanted to make sure Windows 8 app developers could choose to use the formats they prefer. Formats popular among the enthusiast community or with specific developers such as FLAC, MKV, and OGG, can have their own CODECs packaged as part of a Metro style app, since the Windows 8 media platform is highly extensible.
Auto-orientation of video
With the proliferation of video recording in traditional cameras, smartphones, and tablets, users can capture video while holding their device in either portrait or landscape mode – there is no “right-side-up” any longer, thanks to modern touch-based interfaces. Many of us have experienced the frustration of recording a video and realizing the camera was sideways or upside down only after viewing it on the PC. Since the video scan pattern is fixed, videos may not be oriented properly when viewed.
To overcome this problem, cameras are beginning to author orientation metadata in mainstream file formats such as MP4 and ASF when saving recorded video to storage.