Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Heh, welcome to the world of livestreaming media. The reason why it's hard to create this kind of simple "stream in, stream out" abstraction is because most IP Voice/Video stacks are architected very differently than stateless net protocols that are popular today. IP streaming generally works by:

1. A signaling layer that helps setup the connection metadata (a layer where the sender can say they're the sender, that they'll be sending data to port n, that the data will be encoded using codec foo, etc)

2. Media streams that are opened based on the metadata transferred over the signaling layer that are usually just streams of encoded packets being pushed over the wire as fast as the media source and the network allows.

Most IP Media stacks (RTSP, RTMP, WebRTC, SIP, XMPP, Matrix, etc) follow this same pattern. This is different than "modern" protocols like HTTP where signaling is bound together with data using framing (e.g. HTTP headers for signaling vs the HTTP request/response body for data.) This design makes IP media stacks especially fragile to NAT connectivity issues and especially hard to proxy. There are typically good reasons this is done (due to latency, non-blocking reads, head-of-line blocking, etc) but these "good reasons" are becoming less good as innovations in lower networking layers (like QUIC or TCPLS) create conditions that make it much easier to organize IP Media in a manner more similar to HTTP. Hopefully one day you'll just be able to take IP Media streams and "convert" or "proxy" them from one format to another.



All the listed protocols came after HTTP. RTSP, SIP borrowed heavily (albeit badly in retrospect) from HTTP.

I do not have all the historical context (early 90s), but for WebRTC, the idea was to not define any new protocol(s) or do a clean slate design. but rather to just agree on the flavors of the various protocols, and then to universally implement those. We already had SDP, RTSP, RTP, SAP, etc. And the idea was to cobble together the existing protocols into something everyone could agree on (the young companies, the old companies, etc)

We ended up defining variations to the flavors that we already had and for the most part everything turned out okay (maybe the SDP plan wars did not end up where we wanted it, but… it was a good enough compromise).

For realtime media, if we are able combine “locator:identifier” issue, we will be able to make media and signaling work inband.


I know they came later, so I'm still confused why RTSP and SIP weren't implemented atop HTTP. I realize that RTSP and SIP can push server to client, but there's ways around that, though perhaps long polling and Websockets weren't conceivable when RTSP and SIP were invented. I mean, in a pinch, I have an HTTP server serving a folder where SDP files are generated, and I've written clients that just look for a well-known SDP file and use that to consume an RDP stream. It's a ghetto form of "signaling" that I love using when doing experiments (not suitable for production for various reasons obvious to you I imagine.)

I'm not saying WebRTC had poor design decisions or anything. I think it was very smart for WebRTC to reuse SDP, RDP, etc so the same libraries and pipelines could keep working with minimal changes. It also means very little new learning for folks familiar with the rest of the stack.

> For realtime media, if we are able combine “locator:identifier” issue, we will be able to make media and signaling work inband.

+1000. I think RTSP+TCP is a decent way to do in-band signaling and media, and RTMP defines strict ways to send both anyway.


To me, the whole typical IP Multimedia stack screams telco. They prefer to remove and reattach headers upon passing interfaces, separate control and data plane, and rely on synchronization for session integrity. Great when there's a phone line to HQ and a heavily metered satellite link to do a live, I guess...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: