Please provide a short (approximately 100 word) summary of the following web Content, written in the voice of the original author. If there is anything controversial please highlight the controversy. If there is something surprising, unique, or clever, please highlight that as well. Content: Title: Real-Time Video Processing with WebCodecs and Streams Site: WebRTC used to be about capturing some media and sending it from Point A to Point B. Machine Learning has changed this. Now it is common to use ML to analyze and manipulate media in real time for things like virtual backgrounds, augmented reality, noise suppression, intelligent cropping, and much more. To better accommodate this growing trend, the web platform has been exposing its underlying platform to give developers more access. The result is not only more control within existing APIs, but also a bunch of new APIs like Insertable Streams, WebCodecs, Streams, WebGPU, and WebNN. So how do all these new APIs work together? That is exactly what W3C specialists, François Daoust and Dominique Hazaël-Massieux (Dom) decided to find out. In case you forgot, W3C is the World Wide Web Consortium that standardizes the Web. François and Dom are long-time standards guys with a deep history of helping to make the web what it is today. This is the first of a two-part series of articles that explores the future of real-time video processing with WebCodecs and Streams. This first section provides a review of the steps and pitfalls in a multi-step video processing pipeline using existing and the newest web APIs. Part two will explore the actual processing of video frames. I am thrilled about the depth and insights these guides provide on these cutting-edge approaches – enjoy! {“editor”, “ chad hart “} About Processing Pipelines In simple WebRTC video conferencing scenarios, audio and video streams captured on one device are sent to another device, possibly going through some intermediary server. The capture of raw audio and video streams from microphones and cameras relies on getUserMedia . Raw media streams then need to be encoded for transport and sent over to the receiving side. Received streams must be decoded before they can be rendered. The resulting video pipeline is illustrated below. Web applications do not see these separate encode/send and receive/decode steps in practice – they are entangled in the core WebRTC API and under the control of the browser. If you want to add the ability to do something like remove users’ backgrounds, the most scalable and privacy respective option is to do it client-side before the video stream is sent to the network. This operation needs access to the raw pixels of the video stream. Said differently, it needs to take place between the capture step and encode steps. Similarly, on the receiving side, you may want to give users options like adjusting colors and contrast, which also require raw pixel access between the decode and render steps. As illustrated below, this adds an extra process steps to the resulting video pipeline. This made Dominique Hazaël-Massieux and me wonder how web applications can build such media processing pipelines. The main problem is raw frames from a video stream cannot casually be exposed to web applications. Raw frames are: large – several MB per frame, plentiful – 25 frames per second or more, not easily exposable – GPU to CPU read-back often needed, and browsers need to deal with a variety of pixel formats (RGBA, YUV, etc.) and color spaces under the hoods. As such, whenever possible, web technologies that manipulate video streams on the web ( HTMLMediaElement , WebRTC , getUserMedia , Media Source Extensions ) treat them as opaque objects and hide the underlying pixels from applications. This makes it difficult for web applications to create a media processing pipeline in practice. Fortunately, the VideoFrame interface in WebCodecs may help, especially if you couple this with the MediaStreamTrackProcessor object defined in MediaStreamTrack Insertable Media Processing using Streams that creates a bridge between WebRTC and WebCodecs. WebCodecs lets you access and process raw pixels of media frames. Actual processing can use one of many technologies, starting with good ol’ JavaScript and including WebAssembly , WebGPU , or the Web Neural Network API (WebNN). After processing, you could get back to WebRTC land through the same bridge. That said, WebCodecs can also put you in control of the encode/decode steps in the pipeline through its VideoEncoder and VideoDecoder interfaces. These can give you full control over all individual steps in the pipeline: For transporting the processed image somewhere while keeping latency low, you could consider WebTransport or WebRTC’s RTCDataChannel . For rendering, you could render directly to a canvas through drawImage , using WebGPU, or via an