Streaming
Streaming is the process of transmitting audio and video data in a continuous flow over a wired or wireless internet connection.
Index
Applications
Streaming applications are software programs that allow users to reproduce streams or to stream content over the internet. These applications are designed to facilitate the transmission and playback of video data, making it easy for users to watch audiovisual content from anywhere with an internet connection.
There are many streaming applications available, including:
Media Servers
Media servers are software programs that deliver video and audio content to clients who request it. The most common use of media servers is to deliver video on demand (VOD), in which the media server retrieves prerecorded video content from storage and delivers it across the Internet. Live streaming media servers deliver content as it is generated in real time or with only a slight delay.
There are many streaming media servers available, including:
- AntMedia
- Broadcast Box
- Janus
- Jitsi
- Kurento
- LiveKit
- MediaMTX
- Medooze
- Millicast
- Node Media Server
- Oven Media Engine
- Simple Realtime Server
Codecs
Codecs are devices or computer programs which encode or decode data streams or signals. Quantization is used to map input values from a large set (often a continuous set) to output values in a countable smaller set (often a finite set). The greater the quantization step, the lower the quality of the encoded video (lower Peak signal-to-noise ratio (PSNR)) the lower the bitrates. Greater quantization comes with lower computation complexity.
AVC/H.264
Advanced Video Coding (AVC), also known as H.264, is a video compression standard based on block-oriented, motion-compensated coding. Is the most commonly used format for the recording, compression, and distribution of video content but it is not well suited for the high bandwidth demands of 4K streaming due to the high compression ratios. It has many kinds of profiles and levels, and not every encoder or decoder supports every profile and level.
profile-level-id
- The first byte represents
profile_idc
. - The second byte represents
profile_iop
. Each bit of it corresponds toconstraint_set{0,1,2,3,4,5}_flag
, a total of 6 bits, the last 2 bits are reserved bits, which are always 0. - The third byte represents
level_idc
.
Constrained Baseline
Decoders conforming to the Constrained Baseline
profile at a specific level shall be capable of decoding all bitstreams in which all of the following are true:
profile_idc
is equal to 66 or constraint_set0_flag is equal to 1.constraint_set1_flag
is equal to 1.level_idc
andconstraint_set3_flag
represent a level less than or equal to the specified level.
Examples
0x42001f
- The first byte
0x42
(66) corresponds to profileBaseline Profile
. - The third byte
0x1f
(31) corresponds to level3.1
.
- The first byte
0x42e01f
- The first byte
0x42
(66) corresponds to profileBaseline Profile
. - The second byte
0xe0
(1 1 1 0 0 0 00) matchs toConstrained
. - The third byte
0x1f
(31) corresponds to level3.1
.
- The first byte
0x4d0032
- The first byte
0x4d
(77) corresponds to profileMain Profile
. - The third byte
0x32
(50) corresponds to level5.0
.
- The first byte
0x640032
- The first byte
0x64
(100) corresponds to profileHigh Profile
. - The third byte
0x32
(50) corresponds to level5.0
.
- The first byte
0x640c34
- The first byte
0x64
(100) corresponds to profileHigh Profile
. - The second byte
0x0c
matchs toConstrained
. - The third byte
0x34
(52) corresponds to level5.2
.
- The first byte
HEVC/H.265
High Efficiency Video Coding (HEVC), also known as H.265, is a video compression standard designed as a successor to the widely used AVC. In comparison to AVC, HEVC offers from 25% to 50% better data compression at the same level of video quality, or substantially improved video quality at the same bit rate. It supports resolutions up to 8192x4320, including 8K UHD.
Transport
Transport protocols are standardized methods of delivering different types of media over the internet. They send chunks of content from one endpoint to another and define the method for reassembling these chunks into playable content on the other endpoint.
RTP
Real-time Transport Protocol (RTP) is a network protocol used in communication and entertainment systems that involve streaming media.
RTSP
Real Time Streaming Protocol (RTSP) is an application-level network protocol designed for multiplexing and packetizing multimedia transport streams. The transmission of streaming data itself is not a task of RTSP, most media servers use RTP in conjunction with RTCP for media stream delivery. Clients of media servers issue commands such as play, record and pause, to facilitate real-time control of the media streaming. The well known TCP port for RTSP traffic is 554. The most common use case of RTSP is streaming using IP cameras.
RTMP
Real-Time Messaging Protocol (RTMP) is a communication protocol for streaming audio, video, and data over the Internet that works on top of TCP and uses port number 1935 by default.
HLS
HTTP Live Streaming (HLS) is an HTTP-based adaptive bitrate streaming communications protocol. Resembles DASH in that it works by breaking the overall stream into a sequence of small HTTP-based file downloads, each downloading one short chunk of an overall potentially unbounded transport stream. A list of available streams, encoded at different bit rates, is sent to the client using an extended M3U playlist.
DASH
Dynamic Adaptive Streaming over HTTP (DASH), also known as MPEG-DASH, is an adaptive bitrate streaming technique that enables high quality streaming of media content over the Internet delivered from conventional HTTP web servers. Similar to HLS, DASH works by breaking the content into a sequence of small segments, which are served over HTTP.
SRT
Secure Reliable Transport (SRT) is an open source transport protocol that provides connection and control, reliable transmission at the application layer using UDP as the underlying transport layer. It supports packet recovery while maintaining low latency (120ms by default) and also supports encryption using AES. It has 3 working modes:
Listener
runs a server to listen for incoming connections.Caller
starts a connection to a known listener.Rendezvous
creates a bi-directional link where the first to initiate handshake is considered caller.
NDI
Network Device Interface (NDI) is a royalty-free software standard developed by NewTek to enable video-compatible products to communicate, deliver, and receive high-definition video over a computer network in a high-quality, low-latency manner that is frame accurate and suitable for switching in a live production environment.
Topologies
Mesh
In a mesh topology each node is directly connected to every other node. Each node sends its streams to every single node and downloads the streams from every node.
For a session with N nodes the total number of connections is O(N²)
.
Nodes | N |
---|---|
Uplinks | N(N-1) |
Downlinks | N(N-1) |
Uplinksnode | N-1 |
Downlinksnode | N-1 |
Pros:
- Low latency.
- Low server loads.
- End-to-end encryption.
Cons:
- Poor scaling.
- High node loads.
- Connectivity problems with NATs, firewalls, etc.
MCU
In a Multipoint Conferencing Unit (MCU) topology each node is connected to the MCU server. With a MCU, each node uploads its stream once, the server decodes
the stream, mixes the streams of all the nodes into one and encodes
the stream to send it back to each node.
For a session with N nodes the total number of connections is O(N)
.
Nodes | N |
---|---|
Uplinks | N |
Downlinks | N |
Uplinksnode | 1 |
Downlinksnode | 1 |
Pros:
- Good scaling.
- Low node loads.
- No connectivity problems.
- Works well in low bandwidth environments.
Cons:
- High latency.
- High server loads.
SFU
In a Selective Forwarding Unit (SFU) topology each node is connected to the SFU server. With a SFU, each node uploads its stream once and the server forwards
the stream to every node.
For a session with N nodes the total number of connections is O(N²)
.
Nodes | N |
---|---|
Uplinks | N |
Downlinks | N(N-1) |
Uplinksnode | 1 |
Downlinksnode | N-1 |
Pros:
- Good scaling.
- Medium node loads.
- Low server loads.
- No connectivity problems.
Cons:
- No end-to-end encryption (although there are experimental approaches of header only decryption).
Bandwidth Strategies
In video streaming, bandwidth usage directly impacts the resolution, clarity, and overall viewing experience. Higher resolutions like high definition (HD) or standard definition (SD) require more bandwidth for smooth playback than lower ones.
For example:
-
A videoconference of 2 streams of 1920x1080 at 30fps, assuming a 3mbps bitrate.
3/8(1MB/s = 8Mb/s)
* 60(seconds)
* 2(streams)
= 45MB/min -
A videoconference of 2 streams of 640x360 at 30fps, assuming a 1mbps bitrate.
1/8(1MB/s = 8Mb/s)
* 60(seconds)
* 2(streams)
= 15MB/min
Simulcast
Simulcast allows peers to publish multiple versions of the same stream with different spatial or temporal encodings, effectively sending more data.
Spatial
With spatial scalability the lower resolution layers consume less bandwidth than the high resolution ones.
For example:
- High: 1280x720 2.5mbps
- Medium: 640x360 400kbps
- Low: 320x180 125kbps
The peer uses just 17% more bandwidth to publish the three layers.
Temporal
With temporal scalability it is possible to lower a stream's bitrate by dynamically reducing the stream's frame rate.
Streams contain mostly delta frames which depend on previous key frames. If the decoder needs to apply a delta to a key frame that was dropped, it can't render subsequent frames.
When temporal layers are used, frames from the base layer only reference other base layer frames.
For a subscriber with limited bandwidth, it is possible to send only the frames of a specific temporal layer, effectively reducing bandwidth.
Scalable Video Coding
Scalable Video Coding (SVC) is a video compression standard that defines encoding of a high-quality video bitstream that also contains one or more subset bitstreams (a form of layered coding). A subset video bitstream is derived by dropping packets from the larger video to reduce the bandwidth required for the subset bitstream. The subset bitstream can represent a lower spatial resolution (smaller screen), lower temporal resolution (lower frame rate), or lower quality video signal.