This content originally appeared on web.dev and was authored by Thomas Steiner
Insertable streams for MediaStreamTrack
is part of the
capabilities project and is currently in development. This post will
be updated as the implementation progresses.
Background #
In the context of the
Media Capture and Streams API
the MediaStreamTrack
interface represents a single media track within a stream; typically, these are audio or video
tracks, but other track types may exist.
MediaStream
objects consist of
zero or more MediaStreamTrack
objects, representing various audio or video tracks. Each
MediaStreamTrack
may have one or more channels. The channel represents the smallest unit of a
media stream, such as an audio signal associated with a given speaker, like left or right in a
stereo audio track.
What is insertable streams for MediaStreamTrack
? #
The core idea behind insertable streams for MediaStreamTrack
is to expose the content of a
MediaStreamTrack
as a collection of streams (as defined by the WHATWG
Streams API). These streams can be manipulated to introduce new
components.
Granting developers access to the video (or audio) stream directly allows them to apply
modifications directly to the stream. In contrast, realizing the same video manipulation task with
traditional methods requires developers to use intermediaries such as <canvas>
elements. (For
details of this type of process, see, for example,
video + canvas = magic.)
Use cases #
Use cases for insertable streams for MediaStreamTrack
include, but are not limited to:
- Video conferencing gadgets like "funny hats" or virtual backgrounds.
- Voice processing like software vocoders.
Current status #
Step | Status |
---|---|
1. Create explainer | Complete |
2. Create initial draft of specification | In Progress |
3. Gather feedback & iterate on design | In progress |
4. Origin trial | In progress |
5. Launch | Not started |
How to use insertable streams for MediaStreamTrack
#
Enabling support during the origin trial phase #
Starting in Chrome 90, insertable streams for MediaStreamTrack
is available as part of the
WebCodecs origin trial in Chrome. The origin trial is expected to end in Chrome 91 (July 14, 2021).
If necessary, a separate origin trial will continue for insertable streams for MediaStreamTrack
.
Origin trials allow you to try new features and give feedback on their usability, practicality, and effectiveness to the web standards community. For more information, see the Origin Trials Guide for Web Developers. To sign up for this or another origin trial, visit the registration page.
Register for the origin trial #
- Request a token for your origin.
- Add the token to your pages. There are two ways to do that:
- Add an
origin-trial
<meta>
tag to the head of each page. For example, this may look something like:
<meta http-equiv="origin-trial" content="TOKEN_GOES_HERE">
- If you can configure your server, you can also add the token
using an
Origin-Trial
HTTP header. The resulting response header should look something like:
Origin-Trial: TOKEN_GOES_HERE
- Add an
Enabling via chrome://flags #
To experiment with insertable streams for MediaStreamTrack
locally, without an origin trial token,
enable the #enable-experimental-web-platform-features
flag in chrome://flags
.
Feature detection #
You can feature-detect insertable streams for MediaStreamTrack
support as follows.
if ('MediaStreamTrackProcessor' in window && 'MediaStreamTrackGenerator' in window) {
// Insertable streams for `MediaStreamTrack` is supported.
}
Core concepts #
Insertable streams for MediaStreamTrack
builds on concepts previously proposed by
WebCodecs and conceptually splits the MediaStreamTrack
into two
components:
- The
MediaStreamTrackProcessor
, which consumes aMediaStreamTrack
object's source and generates a stream of media frames, specificallyVideoFrame
orAudioFrame
) objects. You can think of this as a track sink that is capable of exposing the unencoded frames from the track as aReadableStream
. It also exposes a control channel for signals going in the opposite direction. - The
MediaStreamTrackGenerator
, which consumes a stream of media frames and exposes aMediaStreamTrack
interface. It can be provided to any sink, just like a track fromgetUserMedia()
. It takes media frames as input. In addition, it provides access to control signals that are generated by the sink.
The MediaStreamTrackProcessor
#
A MediaStreamTrackProcessor
object exposes two properties:
readable
: Allows reading the frames from theMediaStreamTrack
. If the track is a video track, chunks read fromreadable
will beVideoFrame
objects. If the track is an audio track, chunks read fromreadable
will beAudioFrame
objects.writableControl
: Allows sending control signals to the track. Control signals are objects of typeMediaStreamTrackSignal
.
The MediaStreamTrackGenerator
#
A MediaStreamTrackGenerator
object likewise exposes two properties:
writable
: AWritableStream
that allows writing media frames to theMediaStreamTrackGenerator
, which is itself aMediaStreamTrack
. If thekind
attribute is"audio"
, the stream acceptsAudioFrame
objects and fails with any other type. If kind is"video"
, the stream acceptsVideoFrame
objects and fails with any other type. When a frame is written towritable
, the frame'sclose()
method is automatically invoked, so that its media resources are no longer accessible from JavaScript.readableControl
: AReadableStream
that allows reading control signals sent from any sinks connected to theMediaStreamTrackGenerator
. Control signals are objects of typeMediaStreamTrackSignal
.
In the MediaStream
model, apart from media, which flows from sources to sinks, there are also
control signals that flow in the opposite direction (i.e., from sinks to sources via the track). A
MediaStreamTrackProcessor
is a sink and it allows sending control signals to its track and source
via its writableControl
property. A MediaStreamTrackGenerator
is a track for which a custom
source can be implemented by writing media frames to its writable
field. Such a source can receive
control signals sent by sinks via its readableControl
property.
Bringing it all together #
The core idea is to create a processing chain as follows:
Platform Track → Processor → Transform → Generator → Platform Sinks
For a barcode scanner application, this chain would look as in the code sample below.
const stream = await getUserMedia({ video: true });
const videoTrack = stream.getVideoTracks()[0];
const trackProcessor = new MediaStreamTrackProcessor({ track: videoTrack });
const trackGenerator = new MediaStreamTrackGenerator({ kind: 'video' });
const transformer = new TransformStream({
async transform(videoFrame, controller) {
const barcodes = await detectBarcodes(videoFrame);
const newFrame = highlightBarcodes(videoFrame, barcodes);
videoFrame.close();
controller.enqueue(newFrame);
},
});
trackProcessor.readable.pipeThrough(transformer).pipeTo(trackGenerator.writable);
trackGenerator.readableControl.pipeTo(trackProcessor.writableControl);
This article barely scratches the surface of what is possible and going into the details is way beyond the scope of this publication. For more examples, see the extended video processing demo and the audio processing demo respectively. You can find the source code for both demos on GitHub.
Demo #
You can see the QR code scanner demo from the section above in action on a desktop or mobile browser. Hold a QR code in front of the camera and the app will detect it and highlight it. You can see the application's source code on Glitch.
Security and Privacy considerations #
The security of this API relies on existing mechanisms in the web platform. As data is exposed using
the VideoFrame
and AudioFrame
interfaces, the rules of those interfaces to deal with
origin-tainted data apply. For example, data from cross-origin resources cannot be accessed due to
existing restrictions on accessing such resources (e.g., it is not possible to access the pixels of
a cross-origin image or video element). In addition, access to media data from cameras, microphones,
or screens is subject to user authorization. The media data this API exposes is already available
through other APIs. In addition to the media data, this API exposes some control signals such as
requests for new frames. These signals are intended as hints and do not pose a significant security
risk.
Feedback #
The Chromium team wants to hear about your experiences with insertable streams for
MediaStreamTrack
.
Tell us about the API design #
Is there something about the API that does not work like you expected? Or are there missing methods or properties that you need to implement your idea? Do you have a question or comment on the security model? File a spec issue on the corresponding GitHub repo, or add your thoughts to an existing issue.
Report a problem with the implementation #
Did you find a bug with Chromium's implementation? Or is the implementation different from the spec?
File a bug at new.crbug.com. Be sure to include as much detail as you can,
simple instructions for reproducing, and enter Blink>MediaStream
in the Components box.
Glitch works great for sharing quick and easy repros.
Show support for the API #
Are you planning to use insertable streams for MediaStreamTrack
? Your public support helps the
Chromium team prioritize features and shows other browser vendors how critical it is to support
them.
Send a tweet to @ChromiumDev using the hashtag
#InsertableStreams
and let us know where and how you are using it.
Helpful links #
Acknowledgements #
The insertable streams for MediaStreamTrack
spec was written by
Harald Alvestrand and Guido Urdaneta.
This article was reviewed by Harald Alvestrand, Joe Medley,
Ben Wagner, Huib Kleinhout, and
François Beaufort. Hero image by
Chris Montgomery on
Unsplash.
This content originally appeared on web.dev and was authored by Thomas Steiner
Thomas Steiner | Sciencx (2021-05-04T00:00:00+00:00) Insertable streams for MediaStreamTrack. Retrieved from https://www.scien.cx/2021/05/04/insertable-streams-for-mediastreamtrack/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.