WebRTC is not the future of low latency live streaming... at least not outside o...

Sean-Der · on June 6, 2022

I am heavily biased toward WebRTC. Here is my take on it though!

> It's incredibly complex as a specification

What is complex about it? I can go and read the IETF drafts, webrtcforthecurious.com, https://github.com/adalkiran/webrtc-nuts-and-bolts and multiple implementations.

QUIC/WebTransport seems simple because it doesn't address all the things WebRTC does.

> has limitations and numerous issues that set limits in how scalable it can be

https://phenixrts.com/en-us/ does 500k viewers. I don't think anything about WebRTC makes it unscalable.

-----

IMO the future is WebRTC.

* Diverse users makes the ecosystem rich. WebRTC supports Conferencing, embedded, P2P/NAT Traversal, remote control... Every group of users has the made the ecosystem a little better.

* Client code is minimal. For most users they just need to exchange Session Descriptions and they are done. You then have additional APIs if you need to change behaviors. Other streaming protocols expect you to put lots of code client side. If you want to target lots of platforms that is a pretty big burden.

* Lots of implementations. C, C++, Python, Go, Typescript

* The new thing needs to be substantially better. I don't know what n is, but it isn't enough to just be a little better then WebRTC to replace it.

Karrot_Kream · on June 6, 2022

> QUIC/WebTransport seems simple because it doesn't address all the things WebRTC does.

Partially agree here, but the design of QUIC(/WebTransport/TCPLS) make some of the features in WebRTC unnecessary:

1. No need for STUN/TURN/ICE. With QUIC you can have the NATed party make an outbound request to a non-NATed party, then use QUIC channels to send/receive RDP from the sender and receiver.

2. QUIC comes with encryption so you don't need to mess with DTLS/SRTP

3. Scaling QUIC channels is much more similar to scaling a stateless service than scaling something heavily stateful like a videobridge and should be easier to manage with modern orchestration tools.

4. For simple, 1:1 cases, QUIC needs a lot less signaling overhead than a WebRTC implementation. For other VC configurations, a streaming layer on QUIC will probably need to implement some form of signaling and will end up looking just like WebRTC signaling.

---

I just wish WebRTC wasn't so prescriptive of DTLS/SRTP. I'm often fiddling around with VC and video feeds on private networks (for example IPSec or an encrypted VPN like Zerotier), and having to opt into the whole CA system there makes it a bit of a pain. There's also the background that having the browser read from a video or voice source isn't always very low-latency even if the DTLS/SRTP comms is going as fast as the network can, which leads to slower glass-to-glass latency, though there are non-browser ways to use WebRTC and many language frameworks as you indicated.

All-in-all small complaints for a good technology stack though.

vr000m · on June 6, 2022

ICE is needed when both parties are NATes, if one party was not mated, we’d not need ICE in webrtc either.

Agree on 2.

On 3. The videobridge needs state on who is on the session and who to forward to. that requirement doesn’t go away with QUIC. Unless you’re thinking that the video streams are some kind of a named resource or object.

I think the most people gripe about is SDP and it’s prescription of negotiation and encoding. I agree that capability negotiation can be vastly simplified given some of the capabilities can be inferred later in the session.

Karrot_Kream · on June 6, 2022

1. Fair on ICE. But if only one party is NATed, then you don't need STUN (or TURN if there's CGNAT involved.)

3. Yeah I was thinking about named resources/objects. If you could generate them predictably, QUIC+RTP can simplify a lot of things.

j1elo · on June 7, 2022

Re: 1.- You'll probably still need it if the NATed party is recvonly, i.e. it expects to just recrive data without it explicitly sending any first.

Karrot_Kream · on June 7, 2022

I would presume that the receiver will still make an output request to the QUIC endpoint to at least bring-up the connection/stream, which should be enough to populate the path in NAT tables, no? It shouldn't be any more wasteful than the regular process which receives packets out-of-band but still needs a signaling channel. This just does in-band signaling, so you bring up the connection, perform signaling, then open a new QUIC stream to receive data.

j1elo · on June 7, 2022

You're right. I thought your comment was replying that in current implementations (not in the context of QUIC), the NATed peer wouldn't need STUN anyways, as of today. I had lost the context of it referring to an hypothetical implementarion over QUIC.

rektide · on June 6, 2022

> I just wish WebRTC wasn't so prescriptive of DTLS/SRTP.

There was a webrtc-webtransport spec, but it got renamed/retasked to p2p-webtransport[1]. It got renamed/rebuild ~1 year ago[2]. Feels like a pretty strong indicator of webrtc being deconstructed, but whose to say this goes anywhere. We'd also need webcodecs.

It's somewhat scary & also somewhat exciting thinking of the one good, working, browser supported standard being ripped into pieces (p2p-webtransport, webcodecs, more) & being user-implemented. Having the browser & servers have a well-known target is both great but also perhaps confining. If we leave it up to each site/library to DIY their solution, figure out how to balance the p2p feeds, it'll be a long long time before the Rest of the World (other than the very big few) have reasonable tech again. WebRTC is quite capable & a nice even playing field, with lots of well-known rules to enable creative interopation. We'd be throwing away a lot. I'd hoped for webrtc-webtransport, to at least keep some order & regularity, but that seems out, at the moment. But Webrtc-nv is still ultra-formative; anything could happen.

The rest of the transport stack is also undergoing massive seismic shifts. I feel like we're in for a lot of years of running QUIC or HTTP3 over WebRTC Data-Channels and over WebTransport, so we can explore solutions the new capabilities while not having to ram each & every change through with the browser implementers. It feels like a less visible but far more massive Web Extensibility Manifesto moment ("Browser vendors should provide new low-level capabilities that expose the possibilities of the underlying platform as closely as possible."), only at sub-HTML levels[3]. The browsers refused to let us play with HTTP Push, never let appdevs know realtime resources had been pushed at the browser, so we're still debating terrible WebSocket vs SSE choices; terrible. I think of gRPC-web & what an abomination that is, how sad & pointless that effort is; all because the browser is a mere glimmer of the underlying transport. I feel like a lot of experimentation & exploration is going to happen if we start exploring QUIC or HTTP3 over WebTransport. Attempts to reimagine alternatives to WebRTC are also possible if we had specs like p2p-webtransport, or just did QUIC over DataChannels. Running modern protocols in the client, not the browser, seems like a semi-cursed future, but necessary, at least for a while, while we don't yet know what we could do. The browsers are super laggy, slow to expose capabilities.

[1] https://github.com/w3c/p2p-webtransport

[2] https://github.com/w3c/p2p-webtransport/commit/63370be4bb61a...

[3] https://github.com/extensibleweb/manifesto

sumy23 · on June 6, 2022

Having attempted to WebRTC as a generic video transport, I can say that WebRTC has insurmountable problems. The two biggest issues are:

1) Lack of client-side buffering. This is a benefit in real-time communication, but it limits your maximum bitrate to your maximum download speed. It’s also incredibly insensitive to network blips.

2) Extremely expensive. To keep bitrate down, video codecs only send key frames every so often. When a new client starts consuming a video stream they need to notify the sender that a new key frame is needed. For a video call, this is fine because the sender is already transcoding their stream so inserting a key frame isn’t a big deal. For a static video, needing to transcode the entire thing in real time with dynamic key frames is expensive and unnecessary.

vr000m · on June 6, 2022

Webrtc protocol doesn’t dictate 1 or 2. Although browsers do implement some of their own assumptions for this. By default the client side buffer can be orders of 100s of milliseconds. this is as you pointed out tuned for real-time or live applications.

If you’re doing something like YouTube/Netflix and want to avoid going to a lower definition of the stream, that too can be tuned, albeit you’d want to use simulcast and implement your own player (to feed the video and audio frames for decoding at the pace you dictate).

Karrot_Kream · on June 6, 2022

None of these problems are specific to WebRTC. You'll run into them in a WebRTC implementation, you'll run into them with QUIC, even with ffmpeg on the CLI you'll need to specify buffer sizes. As you mention these are both problems with livestreaming and the more you buffer, the less "live" your stream becomes. If you're interested in transmitting static videos, then why not go with HLS or even just making the static file available for direct download through HTTP instead of a live technology?

sumy23 · on June 6, 2022

The buffer sizes in ffmpeg are more about ensuring that the calculated bitrate is accurate iirc than ensuring smooth streaming (although you need your bitrate enforced to guarantee smooth streaming).

Karrot_Kream · on June 6, 2022

IIRC (it's been a bit since I've configured this), you can specify both codec buffers and buffers for streaming to smooth out issues reading from the codec output. I could be wrong though.

Sean-Der · on June 6, 2022

1.) Why can't you buffer on the client side for WebRTC? That sounds like a client issue (what library were you using?) not the protocol.

2.) I use the same tactic as HLS. Generate your video with a reasonable (~2 seconds) keyframe interval. When a new client connects start sending at the keyframe.

sumy23 · on June 6, 2022

1) The point of WebRTC is that it’s real-time. If you buffer then it’s not real-time.

2) Adding key frames increases the bitrate greatly which exacerbates problem 1.

Sean-Der · on June 6, 2022

1) I don't think WebRTC has a specific point. Lots of users came together with their use cases and was designed by consensus. WebRTC can (and does) have toggles around latency/buffering.

2.) I am not aware of a way you can no keyframes, but be decodable at anytime. I just have done it 'HLS Style' or WebRTC 1:1. Curious if anyone else has different solutions.

sumy23 · on June 6, 2022

1) WebRTC and RTP both have RT in their name. RT stands for real-time. If I recall correctly, the only buffer WebRTC has is the jitter buffer, which is used for packet ordering, not for ensuring that enough has buffered to handle bitrate spikes.

2) Yes, you either need a high keyframe interval or some type of out-of-band signaling framework to generate keyframes. WebRTC uses RTCP. A good question is why does WebRTC feel RTCP is necessary at all? Why not generate a keyframe every N seconds like you do with HLS and remove the complexity of RTCP entirely? The answer is that many clients cannot handle the bitrate at real-time speeds.

saurik · on June 6, 2022

1) That is a specific implementation, and has nothing to do with the protocol, which certainly doesn't define a "jitter buffer". People routinely use RTMP--which also has RT in the name--to transfer content to streaming services with massive buffers at every step in the pipeline.

vr000m · on June 6, 2022

Most common browser implementations use an Open GOP. That means an IFrame is implemented when needed. On scene change or when there’s high motion.

Only naive implementations would burst an IFrame on to the network, most pace them. And if needed, you could split your iframe into several frame intervals and decode them without creating a burst by bit rate.

Actually a lot of webrtc implementations use 1s or 2s GOP length. Again depends on how much control you’ve on your pipeline. Browsers implementations do make some assumptions on usecase.

solar-ice · on June 6, 2022

That is not what open GOP means. Open GOP means pictures can reference IDR frames other than the most recent one in decode order, and is a pain in the ass for various reasons, but is technically more efficient. You're referring to a dynamic GOP.

parineum · on June 6, 2022

I don't know much about webrtc but I do have some security cameras, frigate and home assistant all working together with rtmp streams.

There are some webrtc solution for getting those streams into home assistant with low latency but they are... I don't know the word. They aren't difficult to set up because the instructions are very simple, however, they don't work when I follow them and, from reading forums, that's not uncommon. I have _no_ idea why it doesn't work.

I don't really understand why I can't spin up a docker container that will take my rtmp streams and convert them to webrtc then hook that into home assistant.

I've gathered that webrtc just doesn't work that way but why can't it?

Karrot_Kream · on June 6, 2022

Heh, welcome to the world of livestreaming media. The reason why it's hard to create this kind of simple "stream in, stream out" abstraction is because most IP Voice/Video stacks are architected very differently than stateless net protocols that are popular today. IP streaming generally works by:

1. A signaling layer that helps setup the connection metadata (a layer where the sender can say they're the sender, that they'll be sending data to port n, that the data will be encoded using codec foo, etc)

2. Media streams that are opened based on the metadata transferred over the signaling layer that are usually just streams of encoded packets being pushed over the wire as fast as the media source and the network allows.

Most IP Media stacks (RTSP, RTMP, WebRTC, SIP, XMPP, Matrix, etc) follow this same pattern. This is different than "modern" protocols like HTTP where signaling is bound together with data using framing (e.g. HTTP headers for signaling vs the HTTP request/response body for data.) This design makes IP media stacks especially fragile to NAT connectivity issues and especially hard to proxy. There are typically good reasons this is done (due to latency, non-blocking reads, head-of-line blocking, etc) but these "good reasons" are becoming less good as innovations in lower networking layers (like QUIC or TCPLS) create conditions that make it much easier to organize IP Media in a manner more similar to HTTP. Hopefully one day you'll just be able to take IP Media streams and "convert" or "proxy" them from one format to another.

vr000m · on June 6, 2022

All the listed protocols came after HTTP. RTSP, SIP borrowed heavily (albeit badly in retrospect) from HTTP.

I do not have all the historical context (early 90s), but for WebRTC, the idea was to not define any new protocol(s) or do a clean slate design. but rather to just agree on the flavors of the various protocols, and then to universally implement those. We already had SDP, RTSP, RTP, SAP, etc. And the idea was to cobble together the existing protocols into something everyone could agree on (the young companies, the old companies, etc)

We ended up defining variations to the flavors that we already had and for the most part everything turned out okay (maybe the SDP plan wars did not end up where we wanted it, but… it was a good enough compromise).

For realtime media, if we are able combine “locator:identifier” issue, we will be able to make media and signaling work inband.

Karrot_Kream · on June 6, 2022

I know they came later, so I'm still confused why RTSP and SIP weren't implemented atop HTTP. I realize that RTSP and SIP can push server to client, but there's ways around that, though perhaps long polling and Websockets weren't conceivable when RTSP and SIP were invented. I mean, in a pinch, I have an HTTP server serving a folder where SDP files are generated, and I've written clients that just look for a well-known SDP file and use that to consume an RDP stream. It's a ghetto form of "signaling" that I love using when doing experiments (not suitable for production for various reasons obvious to you I imagine.)

I'm not saying WebRTC had poor design decisions or anything. I think it was very smart for WebRTC to reuse SDP, RDP, etc so the same libraries and pipelines could keep working with minimal changes. It also means very little new learning for folks familiar with the rest of the stack.

> For realtime media, if we are able combine “locator:identifier” issue, we will be able to make media and signaling work inband.

+1000. I think RTSP+TCP is a decent way to do in-band signaling and media, and RTMP defines strict ways to send both anyway.

numpad0 · on June 7, 2022

To me, the whole typical IP Multimedia stack screams telco. They prefer to remove and reattach headers upon passing interfaces, separate control and data plane, and rely on synchronization for session integrity. Great when there's a phone line to HQ and a heavily metered satellite link to do a live, I guess...

irq-1 · on June 6, 2022

WebRTC is used by phenixrts as the delivery from server to client. The promise of WebRTC was P2P direct connections for video/data transport, and server/client for coordination and fallback.

https://phenixrts.com/en-us/faqs.html

> The scalability of Phenix’s platform does not come from the protocol itself, but from the systems built and deployed to accept WebRTC connections and deliver content through them. Our platform is built to scale out horizontally. In order to serve millions of concurrent users subscribing to the same stream in a short period of time, resources need to be provisioned timely or be available upfront.

https://webrtc.org/

> With WebRTC, you can add real-time communication capabilities to your application that works on top of an open standard. It supports video, voice, and generic data to be sent between peers...

kwindla · on June 6, 2022

I agree that RTP over QUIC [1] is closer to what we'd build today if we were starting from scratch than WebRTC is. (Partly benefiting from the lessons learned getting to WebRTC 1.0, of course.)

It's worth noting that QUIC is also a very complex specification and is only going to get more complex as it continues through the standardization process. In parallel, there's ongoing work on the next generation of the WebRTC spec. [2] (WebRTC-NV also adds complexity. Nothing ever gets simpler.)

My guess is that we're at least three years away from being able to use anything other than HLS and WebRTC in production. And -- pessimistically because I've worked on video for a long time and seen over and over that new stuff always take _forever_ to bake and get adoption, maybe that's going to be more like 10 years.

[1] https://github.com/mengelbart/rtp-over-quic-draft [2] https://www.w3.org/TR/webrtc-nv-use-cases/

vr000m · on June 6, 2022

Media over QUIC is interesting. For RTP or peer to peer QUIC, there is more work to be done. But you will end up engineering many of the same things as webrtc suit of protocols (ICE -- STUN, TURN, MULTIPLEXING, etc).

QUIC and webtransport can definitely already do DASH/HLS without some of the protocol complexity by using the QuicStreams (but to use QUICs underlying features, DASH/HLS need to change as well).

Some of us wrote a position statement in 2017, see https://datatracker.ietf.org/doc/html/draft-rtpfolks-quic-rt.... There are new documents around media ingest being proposed currently.

pthatcherg · on June 6, 2022

It's been almost 3 years since I first presented on WebTransport + WebCodecs: https://youtu.be/VD5GBLBiSxo

Live streaming was a motivating example for both of those, as you can tell from the video. And both of them grew out of our efforts to make WebRTC better for live streaming.

vr000m · on June 7, 2022

Agreed and thank you for the contribution. Alas, the work is now embroiled in breaking it apart into smaller parts.

csmpltn · on June 6, 2022

> "Where's the future? A likely candidate may come out of the "Media over QUIC" work in the IETF"

The "future" is going to be a goddamned UDP socket sending compressed media streams across the web. We've reached peak abstraction. We need to come back to first principles, instead of piling on more crap on-top of the browser.

themerone · on June 6, 2022

Corporate firewalls will be blocking QUIC until the end of time. Anyone implementing streaming over QUIC will have to have to implement an HTTP/2 fallback, probably WebRTC, but maybe we will get something new.

vr000m · on June 6, 2022

In the case of QUIC, it is likely that the streaming would be over H/3 (HTTP3) or HTTP over QUIC. They may fallback to H1 or H2 but typically over a long enough time, firewall rules become more relaxed.

Nasreddin_Hodja · on June 8, 2022

I'd like to see native RTSP support in browsers. WebRTC is based on it, I see no reason why it is ignored.