Adaptive Playout Buffering for Audio/video Transmission over the Internet
نویسندگان
چکیده
Transmitting real-time audio/video over the Internet is very difficult due to packet loss and jitter. These parameters vary depending on the locations of the senders and receivers, with typical packet loss rates of 0−20% and one-way delays of 5-500 ms. Delay variations occur within and across audio and video streams, complicating the synchronization process. One possibility for reducing jitter involves buffering audio and video packets at the receiver, so that slower packets arrive in time to be played out in the correct sequence at the appropriate times. This paper presents various adaptive playout buffer algorithms that minimize the effect of delay jitter. We evaluate their effectiveness through experiments based on a real network and compare their performance in terms of delay/packet loss ratios. Although the main focus of this paper is the playout buffering for audio, the synchronization between audio and video streams is also specified. 1. Fixed and adaptive jitter buffering Removing jitter involves collecting packets and holding them in the jitter buffer. This allows slower packets to arrive in time to be played out at the appropriate times. Generally the larger the jitter buffer is, the bigger the added delay and the more packets that are successfully played out. Unfortunately this additional delay lowers the perceived QoS. On the other hand, if the playout delay is set too low, the network-induced delay will cause some packets to arrive too late for playout and thus be lost, which also lowers the perceived QoS. The main objective of jitter buffering is to keep the packet loss rate under 5% and to keep the end-to-end delay as small as possible. The playout buffer delay may be kept fixed, or adaptively adjusted during the transmission. Although a fixed method, which uses a fixed buffer size, is easier to implement than an adaptive method, it can result in unsatisfactory audio or video quality. This is because there is no optimal delay when network conditions vary with time. The fluctuating end-to-end delays experienced over the Internet may cause latency to increase to a level where it is annoying to users (when the buffer is too large), or may cause packet losses due to their late arrivals (when the buffer is too small). Adaptive techniques perform continuous estimation of the network delays and dynamically adjust the playout delay at the beginning of each talkspurt. The playout adjustment is performed during the silent periods between talkspurts. The adjustment is done on the first packet of the talkspurt; all packets in the same talkspurt are scheduled to play out at fixed intervals following the playout of the first packet. This mechanism uses the same playout delay throughout a given talkspurt but permits different playout delays for different talkspurts. The variation of the playout delay may introduce artificially elongated or reduced silence periods, but such modification of silence periods is considered acceptable in the perceived speech if that variation is reasonably limited. 2. Adaptive playout algorithms An effective way to choose the buffering delay is to adapt it to the delay characteristic of the network. Since the current delay characteristic is not known apriori, adaptive algorithms calculate the playout time of each incoming talkspurt based on the delays experienced by already-received packets. In this Section we describe seven different algorithms. We ran those algorithms on the same set of data so we were able to compare the performance of the algorithms under identical network conditions. We collected experimental data at the receiving host in Dublin, Ireland while transmitting audio and video packets from the terminal in Poland. We used G.723.1 encoding scheme for audio and H.261 for video. 1 Department of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland, www.cs.ucd.ie {miroslaw.narbutt, liam.murphy}@ucd.ie Algorithm 1 The first four algorithms were proposed by Ramjee et al in 1994 [1]. Let n be the total delay of audio packet i introduced by the network. Estimation of both the average network delay d and the average delay variation v is calculated for each incoming packet: ∧ d i = A* ∧ d (i-1) + (1-A)*n; ∧ v = A* ∧ v (i-1) + (1-A)* | ∧ d i – n|; These estimations are recomputed each time a packet arrives, but only used when a new talkspurt is initiated. In the detection of a new talkspurt, this algorithm uses their most recent values to calculate the playout delay of the first packet in the talkspurt: p = ∧ d i + B* ∧ v i Any subsequent packets of that talkspurt are played out with rate equal to the generation rate at the sender. Constant A is a fixed weighting factor that characterizes the memory properties of this estimation. To limit sensitivity to short-term packet jitter, A is usually chosen to be 0.99802. B is a variation coefficient that controls delay/packet loss ratio. B is usually chosen to be 4. The larger this coefficient, the more packets are played out at the expense of longer delays. The figures below show the calculated playout times (darker line) and the network delays (dots) of received packets. Packets whose delays are above the darker line are lost. All the others are successfully played out. Fig1. Playout times calculated by Algorithm 1 Algorithm 2 The second algorithm is similar to the first one but adapts more quickly to short burst of packets incurring long delays. The idea is to use two values of the weighting factor A, a smaller one (A_BIS ) for increasing trends in the delay and a bigger one (A) for decreasing trends. Fig2. Playout times calculated by Algorithm 2 Algorithm 3 The third algorithm attempts to be more aggressive in minimizing delays. Instead of using a running estimate of network delays, it uses the minimum network delay of all packets received in the previous talkspurt as the average delay. Fig3. Playout times calculated by Algorithm 3 Algorithm 3b This algorithm, proposed by us, is a modification of Algorithm 3. It uses the maximum delay of all packets received in the previous talkspurt as the estimate of the average delay. This modification minimizes the packet loss factor. Fig4. Playout times calculated by Algorithm 3b Algorithm 4 This algorithm detects spikes – steep raises in network delays, followed by a monotonic decrease back to the normal level. It has two modes of operation, depending on whether a spike has been detected. Fig5. Playout times calculated by Algorithm 4 If a packet arrives with a delay that is larger than given threshold (e.g. some multiple of the current playout delay), the algorithm switches to spike mode. The two modes differ in how the estimate of network delay is updated. In normal mode, a running estimate of the average delay and its variance is performed (as in Algorithm 1). During a spike, the delay estimate tracks the delays more closely. Algorithm 5 In 1995 Moon et al [2] proposed an algorithm that collects network delays of already received packets in order to estimate the playout delay. The delays of the last K packets are recorded and the distribution of delays is updated with each incoming packet. The frequency of each delay is maintained in a histogram. When a new packet arrives, the delay of the oldest packet is removed from the histogram, and the delay of the newest is added. The delay distribution is computed using a cumulative sum of the frequencies, and is done only in the beginning of a new talkspurt. The algorithm calculates a given percentile point of the delay in the distribution function and uses it as a playout delay for the new talkspurt. Algorithm 5 also detects spikes. Once a spike is detected, it stops collecting packet delays. If a new talkspurt begins during a spike, it uses the delay of the first packet of a talkspurt as the playout delay for that talkspurt. Fig6. Playout times calculated by Algorithm 5 The number of recorded packet’s delays determines how sensitive the algorithm is to changes. If it is too small, the algorithm is likely to produce a poor estimate of the playout delay. If it is too long, the algorithm will keep track of an unnecessarily large amount of past history. Algorithm 6 This algorithm, proposed in 1999 by Pinto and Christensen [3], is supposed to target any desired loss rate. It adapts the buffering delay based on arrival and playout times of packets received in the previous talkspurt only. The playout delay is taken straight from the ordered list of delays of the previous talkspurt. It should be the minimum amount of delay that is required to play out the previous talkspurt at exactly the desired packet loss. Like the two previous algorithms, algorithm 6 operates in two modes. In spike mode it uses the delay of the first packet of a talkspurt as the playout delay for that talkspurt. Fig7. Playout times calculated by Algorithm 6. 3. Audio/video synchronization process Audio and video packets are sent across the Internet using the best-effort UDP transport protocol, supported by the application layer RTP protocol. Each RTP header contains a timestamp, a sequence number, a marker bit, and a source id to identify the different streams. All these numbers are useful during the synchronization process. For example, the sequence number is necessary to detect packet losses, the timestamp is needed for inter-stream and intra-stream synchronization, and the marker bit indicates the beginning of a talkspurt. The idea behind the audio-video synchronization process is that the adaptive playout algorithms are performed first, and the video frames are played out dependent on the playout times of their corresponding audio packets. This is done by storing the video frames in a video playout buffer and by delaying each frame until the corresponding audio packets are played out. The correspondence between audio and video frames is given by their timestamps. If the video quality is not acceptable using this scheme which gives priority to audio, additional measures may be needed to improve the video playout, e.g. adaptive adjusting of the video frame playout process. 4. Network delay measurements and algorithm comparison. To examine the performance of the playout algorithms we have built a terminal based on OpenH323 source code [4]. This terminal is H.323-compliant [5] and can interoperate with other H.323 software that uses G.711 A-law, G.711 u-law, G.723.1, GSM audio, and H.261 video compression schemes. Our terminals were running on PC workstations with Windows NT operating system and situated, respectively, at the Performance Laboratory of School of Electronic Engineering in Dublin City University (IRELAND), and at the Institute of Telecommunication in Poznan University of Technology (POLAND). The distance between sender and receiver was 18 hops. All the interconnecting links had a bandwidth ranging from 34 to 155 megabits per second. We made our measurements in the afternoon between 3 pm and 4 pm. For the tests we have chosen the most popular in IP telephony: G.723.1 encoding scheme for audio and H.261 for video. The G.723.1 encoder provides one frame of audio (24 bytes) every 30 ms. Our terminal was set for three audio frames per packet. During 16 minutes transmission we received more than 20000 audio and video packets. stream encoding scheme session length number of packets received maximum network delay minimum network delay average network delay standard deviation AUDIO G.723.1 978.9 s 6841 596 ms 327 ms 361 ms 25 ms VIDEO H.261 979.0 s 13802 642 ms 343 ms 368 ms 14 ms During the transmission, we collected experimental data (the arriving times, timestamps, sequence numbers, and marker bits) of all received packets at the receiving host. In order to compare the playout algorithms, we wrote a simulator which processes that data and simulates the behaviour of the playout algorithms. The simulator ran the seven algorithms on the same set of data so we were able to compare the performance of the algorithms under identical network conditions. During the simulations we controlled delay/packets loss ratio by changing the B factor (algorithms 1-4), varying the percentile point of the distribution function (algorithm 5), or choosing a different desired loss rate (algorithm 6). Figure 8 shows the performance of the playout algorithms in terms of average buffering delay vs. percentage of packets received on time. This is the most popular metric used in technical literature. Fig.8 Buffering delay vs. percentage of packets received on time From the graphs above we can see that Algorithm 3 performs quite poorly, having less than 95% packets received on time, Algorithm 1 is the best in minimizing delays, while Algorithm 3b is the best in minimizing packet loss. All of the algorithms, except Algorithm 3, appear to perform very well reaching packet loss ratio below 5%; the best by that measure is our Algorithm 3b. Algorithm 1 tends to minimize delays but sometimes fails having too many lost packets. 5. Conclusions and Future Work In this paper we have investigated the performance of seven different algorithms for adaptive buffering of packets at the receiver. The main objective of those algorithms is to keep the packet loss rate under 5% and to keep the playout delay as small as possible. We have compared those algorithms from the perspective of the number of packets received on time and the delays introduced by adaptive buffering. Our results indicate that the algorithm proposed by us can achieve the lowest rate of lost packets (2% or less) while adding acceptably small delays. An additional advantage of that algorithm is its small computational complexity. For future work we want to determine the sensitivity of the algorithms to various parameters that control the behavior of the adaptive buffering. We will focus on the synchronization process between audio/video streams at the receiver, and study whether further adaptive adjustments of the video frame playout process are needed to get acceptable video quality.
منابع مشابه
Jitter-free Audio Playout over Best Effort Packet Networks
The aim of this research is to explore important issues in playout of audio over best effort audio networks and the improvements that can be made at the upper layers without changing the underlying network. This paper starts with a review of existing methods, the experiments conducted based on them and comments on their performance. The point of departure is the modification to the autoregressi...
متن کاملQoE Assessment of Multi-View Video and Audio IP Transmission
In this paper, we discuss QoE (Quality of Experience) requirements for MVV (Multi-View Video) and audio transmission over IP networks and study the effect of the playout buffering time, contents and viewpoint change interfaces on the QoE and user’s behavior. Unlike previous works, which mainly discuss MVV transmission from aspects of video codecs, we study MVV and audio transmission under vario...
متن کاملAdaptive playout scheduling and loss concealment for voice communication over IP networks
The quality of service limitation of today’s Internet is a major challenge for real-time voice communications. Excessive delay, packet loss, and high delay jitter all impair the communication quality. A new receiver-based playout scheduling scheme is proposed to improve the tradeoff between buffering delay and late loss for real-time voice communication over IP networks. In this scheme the netw...
متن کاملA Framework for Cloud P2P VoD System based on User's Behavior Analysis
Video distribution over the Internet has already become a major application due to users’ growing demand of Video content and continuous growth of network technologies. Recently, as the rapid growth in the number of Video-OnDemand (VoD) systems, people ask for more and more video resources, which cause overload on network resources. Therefore, how to use the capability of peers to reduce the se...
متن کاملAdaptive playout scheduling using time-scale modification in packet voice communications
A new receiver-based playout scheduling scheme is proposed, which estimates the network delay from past statistics and adaptively adjusts the playout time of the voice packets. In contrast to previous work, the adjustment is not only performed in between talkspurts, but also within the talkspurts in a highly dynamic way. Proper reconstruction of continuous output speech is achieved by scaling i...
متن کامل