For CTOs and product leaders evaluating sub-second video for their product. Vendor pages quote "sub-second WebRTC latency." Our field measurements across four production MediaMTX deployments tell a more nuanced story — and the gap between marketing numbers and real glass-to-glass latency has direct cost implications you should price into your roadmap.
Why this matters before you read the numbers
Every product team building real-time video — telehealth, security operations centres, livestream commerce, drone control, robotic teleoperation — eventually lands on the same architectural question: can we get under one second of glass-to-glass latency without paying per-minute rates to a managed WebRTC vendor?
The answer over the last 18 months has shifted to yes, with caveats, largely because of two things:
- WHIP and WHEP became IETF standards. WHIP (WebRTC-HTTP Ingestion Protocol) was published as RFC 9725 in early 2025, and the corresponding WHEP egress protocol is moving through the same standardization track. Both protocols replace bespoke WebRTC signalling with a single HTTP POST, which means any compliant client and any compliant server can interoperate without custom glue code.
- MediaMTX matured into a credible self-hosted alternative. The open-source project (formerly
rtsp-simple-server) now natively ingests via WHIP, egresses via WHEP, and bridges to RTSP, RTMP, HLS, and SRT in a single binary. That removes the hardest part of self-hosting WebRTC, which historically meant assembling Janus, Coturn, and a custom signalling server.
The catch — and the reason we ran our own measurements — is that headline latency claims rarely survive contact with real networks, real cameras, and real browsers. "Sub-second" can mean 200 ms or 950 ms, and the operational and commercial implications of those two numbers are very different.
What we measured, and how
Across four AdaptNXT deployments shipped between Q3 2025 and Q1 2026, we captured glass-to-glass latency — meaning the time from a physical event in front of the camera to the same event being visible on the viewer's screen. This is the only latency number that matters to end users; protocol-level numbers (RTT, jitter buffer depth) are diagnostic, not experiential.
Our methodology, kept consistent across all four sites:
- A high-refresh display showing a millisecond-precision counter is placed in the camera's field of view.
- A second device captures both the source counter and the rendered viewer screen in the same frame using a 240 fps phone camera.
- Glass-to-glass latency is the per-frame delta between the two counters, averaged across 200 captured frames per test.
- Tests were run during representative production traffic (not idle networks), at the times of day the system is actually used.
We are deliberately not reporting protocol-internal numbers (e.g. "MediaMTX added 14 ms"). Those are interesting for engineers tuning a pipeline; they are noise for a CTO deciding on an architecture.
The four deployments
| # | Use case | Camera/source | Viewer | Network path |
|---|---|---|---|---|
| 1 | Live commerce host stream | Smartphone (RTMP) → MediaMTX | Browser, mobile 4G | India → AWS Mumbai → India |
| 2 | Remote equipment supervisor | IP camera (RTSP) → MediaMTX | Desktop browser | Industrial → on-prem → LAN |
| 3 | Telepresence / two-way video | Browser (WHIP) → MediaMTX | Browser (WHEP) | India → AWS Singapore → US |
| 4 | Edge AI camera review | RK3568, GStreamer (WHIP) → MediaMTX | Browser, same LAN | LAN-only, no internet hop |
The deployments were chosen to span the realistic range: a globally distributed consumer use case, a constrained enterprise use case, a cross-region collaboration use case, and an edge-AI loopback that represents the floor of what the technology can do.
The numbers
Median glass-to-glass latency, in milliseconds:
| # | Deployment | P50 latency | P95 latency | Stable? |
|---|---|---|---|---|
| 1 | Live commerce (smartphone → 4G) | 740 ms | 1,180 ms | YES |
| 2 | Industrial RTSP → corporate viewer | 310 ms | 480 ms | YES |
| 3 | Cross-region (India ↔ US) | 520 ms | 890 ms | CONDITIONAL |
| 4 | Edge AI loopback (LAN, RK3568) | 180 ms | 240 ms | YES |
A note on these numbers. These are AdaptNXT field measurements, not lab benchmarks. Each is the median across ~3 weeks of representative traffic post-launch, not a best-case demo. We are reporting medians and P95s rather than minimums because P95 is what determines whether your support team gets calls.
The headline observation: a well-tuned MediaMTX WHIP/WHEP pipeline reliably hit sub-second median latency in three of four deployments, with the LAN-only edge case landing under 200 ms. The cross-region case (Deployment 3) sat comfortably under one second on median but degraded above one second in the P95 — material if your product depends on conversational turn-taking.
What actually drove the variance
Three factors dominated, in this order of impact:
1. Network path, not the protocol. Across all four deployments, the gap between MediaMTX-internal latency (consistently 30–80 ms) and total glass-to-glass latency was almost entirely the round-trip-time of the network path plus the publisher-side encoder pipeline. This is consistent with the broader WebRTC literature and matches MediaMTX's own documentation, which notes that the protocol itself introduces minimal latency and most of the budget is consumed by client-side buffering. Implication: your hosting region selection matters more than your media server tuning.
2. Encoder choice on the publisher side. Hardware-accelerated encoders (NVENC on x86, V4L2 H.264 on RK3568, Apple VideoToolbox on iOS) consistently shaved 80–150 ms off publisher latency versus software x264. There is also a known interaction between certain OBS-WHIP encoder configurations and MediaMTX's WHEP egress that produces stutter — solvable, but worth budgeting for during integration. Implication: encoder selection is a first-order architecture decision, not an optimization to defer.
3. Viewer device and browser. Mobile browsers added 100–250 ms over desktop Chrome on the same network, almost entirely in the jitter buffer. iOS Safari was the most variable. Implication: if your product is mobile-first, your latency budget is tighter than the desktop demo will suggest.
What this means commercially
We work with product CTOs who are typically choosing between three paths:
- Managed WebRTC vendor (LiveKit Cloud, Daily, Agora). Predictable latency, predictable bill, opaque infrastructure. Per-minute cost scales with usage.
- Self-hosted MediaMTX on your own cloud account. Lower per-minute cost above a usage threshold, you own the data path, but you also own incident response.
- Hybrid. Self-hosted for the steady-state baseline, managed vendor for spike capacity or geographies you don't want to operate in.
Based on what we have actually shipped, the honest framing for a CTO is:
| If your product needs… | Self-hosted MediaMTX is… |
|---|---|
| < 500 ms median, single region, < 100 concurrent viewers | Strongly competitive — and cheaper above ~50k viewer-minutes/month |
| < 500 ms median, multi-region, thousands of concurrent viewers | Possible but operationally heavy — you are now running a global SFU fleet |
| < 300 ms for full-duplex conversation | Use a managed vendor unless you have a streaming infra team |
| Sub-second one-way fan-out for commerce or monitoring | Strong fit — this is where we see the best ROI |
| Edge-AI loopback or LAN-only use cases | Best in class — NO managed vendor beats LAN MediaMTX |
The crossover point we see in our deployments is roughly 50,000 viewer-minutes per month for one-way fan-out use cases. Below that, the engineering cost of self-hosting outweighs the per-minute savings. Above it, the unit economics of self-hosted MediaMTX become hard to argue with.
Three things we wish we had known earlier
Distilled from the four deployments, written for the leader who is about to greenlight the next one:
- Budget for a 4–6 week tuning phase, not a 1-week deployment. The default MediaMTX configuration is a sensible starting point, not a production endpoint. The work is in the encoder pipeline, the TURN server placement, and the jitter buffer choices — none of which are MediaMTX-specific, all of which take real time.
- TURN server placement is the highest-leverage choice no one talks about. A poorly placed TURN relay can add 200+ ms to clients on restrictive networks. Co-locating TURN with MediaMTX is the easy win; the harder choice is operating regional TURN clusters. Plan for this in your cloud architecture from day one.
- Browser support is moving but uneven. Chrome and Edge are stable. Safari behaves well on macOS but is the first thing to break on iOS. If your product is iOS-first, do a Safari-specific feasibility test before committing to a self-hosted WebRTC architecture.
When to use this post
If you are scoping a real-time video product and your team is debating managed vendor versus self-hosted MediaMTX, the numbers above should give you a defensible starting baseline for your own internal trade-off analysis. They are not a substitute for measuring your own pipeline — every camera, every network, and every browser combination shifts the result by 50–200 ms — but they are an honest field reference for what is achievable.
If you'd like to discuss what a feasibility benchmark for your specific use case would look like, our streaming engineering team is happy to walk through it with you. We can typically deliver a representative latency measurement for your hardware and network within two weeks.
References and further reading
- RFC 9725 — WebRTC-HTTP Ingestion Protocol (WHIP). Internet Engineering Task Force. The official specification of WHIP. https://datatracker.ietf.org/doc/rfc9725/
- WebRTC-HTTP Egress Protocol (WHEP) — IETF draft. The corresponding egress standard, currently in standards-track development. https://datatracker.ietf.org/doc/draft-ietf-wish-whep/
- MediaMTX — bluenviron/mediamtx. The open-source media server documented in this post. https://github.com/bluenviron/mediamtx
- OBS Studio WHIP streaming documentation. Reference for publisher-side encoder configuration. https://obsproject.com/kb/whip-streaming-guide
- WebRTC for the Curious — Sean DuBois. The standard public reference text for understanding WebRTC internals, freely available. https://webrtcforthecurious.com