2$ OR 100RS/ MONTH

PPT ON Lossless compression of Digital Audio

Abstract This paper provides an algorithm which has an unbelievable compression ratio for digital audio(Lossless).This is more advantages than Lossy compression such as MPEG or MP3. This paper contains a powerful lossless compression algorithm named “AudioPak” ,which is for high-fidelity mixing in professional environment. This algorithm will be very useful in Internet applications for quick downloading. It’s compression ratio is higher than the PkZip which is also an lossless compression algorithm mostly used. AudioPak is designed using the basic principles of framing ,prediction, and Entropy coding and it is implemented using only a small number of integer arithmetic operations in both Coder and Decoder side. Simulation for this algorithm is also presented in this paper. Our algorithm performs as well , or better, than most state-of-art- lossless coders. Keywords: AudioPak, Entropy coding (Ricecoding Method),Golomb,coding. INTRODUCTION While much work has been done in the area of lossy compression of speech and audio signals, relatively little has been published on lossless compression of these signals. Although lossless audio compression is not likely to become a dominating technology, it may become a useful complement to lossy compression algorithms in some applications. This is because, as we will see, lossless compression algorithms rarely obtain a compression ratio larger than 3:1, while lossy compression algorithms allow compression ratios to range upto 12:1 and higher. For lossy algorithms, the higher the compression ratio becomes, the lower the resulting final audio quality, but when the lowest possible data rate is required, lossy techniques are the only alternative. However, lossless audio coding of stereo CD quality digital audio signals sampled at 44.1 kHz and quantized to 16 bits could become an essential technology for digital music distribution over the Internet because some consumers will want to acquire the best possible quality of an audio recording for their high-fidelity stereo system. Lossy audio compression technologies such as MPEG or MP3 may not be acceptable for this application .Internet resources are limited and will likely remain constrained because the demand for service continues to increase at least as fast as the rate of growth of transmission capacity. Therefore, compression will continue to be important for Internet transmission of audio signals. It is reasonable to suggest, for example, that music distribution applications will offer highly compressed audio clips to the consumer for browsing and selection. After selection, the consumer who is accustomed to audio CD quality may require access to a losslessly compressed copy of the original a copy without any distortion due to the compression algorithm. Music distribution over the Internet is not the only application for which lossless audio compression technology is of interest. This technology can also be used for the archiving and mixing of high-fidelity recordings in pro-fessional environments. In this case, lossless compression avoids the degradations incurred in multiple encoding and decoding when using lossy compression coders. This problem arises particularly in the post production industry and occurs when the signals are modified or when different coding formats are used. Lossless audio coding is also used in the new DVD standard for storage of audio signals at higher resolution and sampling rates . The first section of this article is a survey and a classification of the current state-of-the-art lossless audio compression algorithms (lossless coders). This study finds that lossless audio coders have reached a limit in what can be achieved for lossless compression of audio. The second section is a description of a new lossless audio coder called AudioPaK, which has low algorithmic complexity and performs as well or even better than most of the lossless audio coders that have been described in the literature. State-of-the-Art for Lossless Audio Coders Lossless compression is not a new problem in the general context of computing. A variety of utilities such as PkZip are available and widely used to compress text and program files for storage or transmission over the Internet. These programs are designed to be very effective for text data. For example, a PostScript version of this article occupied 4,560,999 bytes of storage while the PkZipped version occupied only 696,041 bytes of storage— a compression ratio of 4,560,999/696,041 = 6.55. On the other hand, track 04, one of the audio files consisting of 16-bit samples used in our tests, occupied 33,715,964 bytes of storage before compression and 31,573,915 bytes after compression with PkZip. In this case, the compression ratio was only 1.07. What is needed are algorithms specifically designed for digital audio files. Although lossless coding of audio has not received a great deal of attention so far, a number of different algorithms have been developed, and code for testing these algorithms has been made available in almost all cases. The following algorithms are used in lossless compression :  AudioPak  DVD  LTAC  MUSICompress  OggSquish  Philips  Shorten  Sonarc  WA This list represents all the lossless coders that we found in the literature or through searching the Web. We believe that these coders represent the state-of-the-art in lossless audio compression. Shorten, MUSICompress, LTAC , DVD and Philips’ algorithms are described in the literature. In-formation concerning the other coders—Sonarc , OggSquish v98.9 , and WA comes from personal communications with the designers of these algorithms. AudioPaK is a new lossless audio coder described later in this paper. Basic Principles of Lossless Compression The block diagram representation of the operations involved in compressing a single audio channel. All of the techniques studied are based on the principle of first re-moving redundancy from the signal and then coding the resulting signal with an efficient digital coding scheme. Each of the steps shown in block diagram will be discussed in general terms and in terms of specific coders. Diagram shows the operations for a single channel of lossless audio compression. Although the two channels of a stereo recording are generally not independent, the dependence is often weak and difficult to take into account. Therefore, the multichannel case of lossless compression has not received much attention in the literature. Generally, stereo channels are compressed separately without attempting to take advantage of the correlation that might exist between the left and right channels or by simply coding the left channel and the difference between the two channels. Framing The framing operation is introduced to provide for editibility, an important and necessary property for most applications dealing with digital audio. It is often important to quickly and simply edit a compressed audio bit stream, and the sheer volume of data prohibits repetitive decompression of the entire signal preceding the region to be edited. Therefore, a practical solution is to divide the digital audio signal into independent frames of equal time duration. This duration should not be too short, since significant overhead may result from the header that is prefixed to each frame. This header is important, since it defines the compression algorithm parameters, which can change on a frame-to-frame basis to follow the varying characteristics of the input signal over time. Furthermore, depending on the application, this header may include additional data, such as multimedia and synchronization information. On the other hand, the frame duration should not be too long, since this would limit the temporal adaptivity and would make editing of the compressed audio bit stream more difficult. The algorithms surveyed compromised with a 13 to 26 ms frame duration. This time interval translates to 576 and 1152 samples for a sampling rate of 44.1 kHz. Intrachannel Decorrelation The purpose of the second stage of the typical lossless audio coder in Fig. 1 is to remove redundancy by decorrelating the samples within a frame. The lossless audio coders in Table 1 use two basic approaches for intrachannel decorrelation. Most algorithms remove redundancy by some type of modified linear predictive modeling of the signal. In this approach, a linear predictor is applied to the signal samples in each block resulting in a sequence of prediction error samples. The predictor parameters, therefore, represent the redundancy that is removed from the signal and the losslessly coded predictor parameters and prediction error together represent the signal in each block. A second, less common, approach is to obtain a low bit-rate quantized or lossy representation of the signal, and then losslessly compress the difference between the lossy version and the original signal. These approaches are dis-cussed in more detail below. Predictive Modeling: The basic principle of predictive modeling is to predict the value of a sample x[n]using the preceding samples x [n -1] , x [n –2] , etc. Fig. 2 is a diagram of intrachannel decorrelation using the predictive modeling scheme. If the predictor performs well, the prediction error e[n]will be uncorrelated and, therefore, have a flat frequency spectrum. Likewise, e[n]will on average be smaller than x[n], and it will, therefore, require fewer bits for its exact digital representation. The prediction models that are used in lossless compression of audio are based on the structure of Fig. 3. This diagram depicts M N e[n]=x[n]- Q{ ∑ a^ k x[n-k]- ∑ b^ k x[n-k]} k=1 k=1 where Q{}denotes quantization to the same word length as the original signal x[n]. In the diagram, A(z)and B(z) are z-transform polynomials, with quantized coefficients, representing the feedforward and feedback. M N ∑ a^ k x[n-k] , ∑ b^ k x[n-k] k=1 k=1 x[n] + e[n] + - 2. Prediction model for intrachannel decorrelation block 3. General structure for prediction terms in (1), respectively. If the polynomial B^(z)=0 , the feedback terms are omitted, and the prediction error filter has finite impulse response (FIR), neglecting the quantizer Q for the moment. Thus, a predictor with only A(z)is an FIR predictor. If the feedback terms are present (B^(z)!=0), the predictor is an infinite impulse response (IIR) predictor. The quantization operation in Fig. 3 makes the predictor a nonlinear predictor, but, since the quantization is to 16-bit precision, it is reasonable to neglect the quantization in understanding the first-order effects of the predictor and in developing methods for estimating the predictor parameters. The quantization is necessary for lossless compression since we want to be able to re-construct x [n] exactly from e[n] remotely and possibly on a different machine architecture. Therefore, e[n]is usually represented with the same fixed-point integer quantization scheme as x[n], so as not to introduce new quantization levels below the least significant bit. Recon-structing x[n] from e[n] can be done by simply solving (1) for x[n]in terms of e[n], which leads to M N e[n]=x[n]+ Q{ ∑ a^ k x[n-k]- ∑ b^ k x[n-k]} k=1 k=1 which is depicted in Fig. 4. Linear predictors are widely used in speech and audio processing . In most applications, an FIR predictor is used, and the coefficients of the prediction filter 4. General structure for reconstruction 5. Details of the intrachannel decorrelation . Both c^[k] & e[n] are entropy coded and transmitted .Q stands for integer function. A^[z] are determined to minimize the mean-squared prediction error. Without the quantizer and with B^[z]=0 in Fig. 3, the prediction coefficients can be found for the FIR case by simply solving a set of linear equations. When FIR linear prediction is used in lossless audio compression, the prediction coefficients can be found by standard methods and then quantized for use in Fig. 3 (with B^[z]=0). As shown in Fig. 4, the same coefficients are used in the reconstruction of x[n]from e[n]. Therefore, the prediction coefficients must be quantized and encoded as part of the lossless representation. Generally, a new set of coefficients is determined for each frame of the signal thereby adapting the predictor to the changing local structure of the signal. The method of determination of the predictor coefficients ranges from estimation of the minimum mean-squared predictor for each block to the simple choice of a predictor from a small library of fixed predictors. In many of the lossless audio compression algorithms in our survey, the predictor for a given block is chosen from a finite library of coefficient sets to avoid the computation required to estimate the optimum predictor coefficients. In these cases, the identity of the predictor that was used for a given frame is all that need be encoded with the compressed prediction error. This is the approach that is followed, for example, in the case where both feedforward and feedback terms are involved in the prediction. This is because the general solution for the minimum mean-squared predictor is much more complicated in the IIR case. For this reason, IIR prediction has not been widely used in other applications, such as speech processing. In lossless compression, have shown that, while IIR prediction has the potential to perform better because it can match a wider range of spectral shapes than FIR prediction with the same number of coefficients, the improvement in compression performance demonstrated so far is modest at best. In general, a minimum mean-squared predictor for audio signals will involve fractional coefficients. These must be quantized and coded as part of the lossless representation, and the predictor must be implemented with fixed-point arithmetic. However, when the prediction coefficients are integers, it is particularly simple to compute the prediction error with integer arithmetic that can easily be done exactly the same way on different computation platforms. The following list shows the algorithms that are based on predictive modeling according to the type of prediction that is used: FIR model : Shorten, Sonarc, WA Philips IIR model : Oggsquish,DVD. Integer coefficient FIR model (fixed or adaptive): Shorten, HEAR, MUSICompress WA, AudioPak The details of how the predictor model is chosen at each frame can be found in each of the above references. Lossy Coding Model: The second approach to lossless compression is based on obtaining an approximation to x[n] by some sort of lossy coding scheme that maintains the signal wave shape with no time shift. Although this approach has some potentially attractive features, only one algorithm that clearly used this approach was found. proposed a method based on transform coding of the audio signal. The basic idea of this method is presented in Fig. 5. This algorithm uses an orthonormal transform, a discrete cosine transform, to obtain a quantized version of the signal. The cosine transform and its inverse are indicated by the matrix multiplications by A and AT in Fig. 5. The use of the cosine transform is motivated by the well-known energy compaction property of the cosine transform, which permits a good reconstruction from only a few quantized values of the cosine transform. The input signal x[n]is transformed and quantized with an unitary quantization stepsize Δ=1. The resulting coefficients c^[k] are then entropy coded. Because the inverse transformation followed by a dequantization step results in an approximation x[n] of the original signal x[n], the decompression steps are duplicated at the coder side to compute the residual error e[n]=x[n]-x^[n] This error is entropy coded and transmitted along with the coded coefficients c^[ k].At the decoder, the original signal can be obtained from x[n]=x^[n]+e[n], provided that the implementation of the inverse cosine transformation and quantization at the decoder is identical to that at the coder. A potential advantage of this and other lossy-coding based approaches is that the lossless representation contains within it a lossy lower data rate representation of the audio signal, which could serve as a “thumbnail sketch” of the signal. MUSICompress also is structured to provide for a lossy compression mode of operation. This approach could be useful in situations where progressive transmission of the signal is desired or where audio signals are broadcast over the Internet to destinations with varying computational resource and data rate . Entropy Coding Entropy coding removes redundancy from the residual signal e[n] (and the discrete cosine transform (DCT) coef-ficients in the transform-based method). In this process, no information is lost. The following three methods are used in the algorithms in Table 1: Huffman, run length, and Rice coding. The first two methods are well known. On the other hand, Rice coding, which is used in some of the coders, is less well known. The Rice code is characterized by only one parameter, denoted henceforth as m. In fact, this code is the Huffman code for a Laplacian probability density function, which is found to be a good approximation for the distribution of the prediction residual samples e[n]for all the intrachannel decorrelation operations discussed above. To form this code, a number is divided into three parts; a sign bit, the m low-order bits, and the remaining high-order bits. The first part of a code word is a single bit indicating the sign of e[n]. The second part consists of the m least-significant bits of the binary representation of the absolute value of e[n]. The third part of the code word is made of N consecutive zeroes, where N has the same binary representation as the yet unused most significant bits of the absolute value of e[n]. Finally, to end this series of zeroes, a bit set to 1 is inserted. Table 2 gives examples of Rice codes for m =3. All the lossless audio coders presented in the literature that use the Rice code Lossless Transform Audio Compression (LTAC) , Philips , Shorten , WA define the single parameter m to be of constant value over an entire frame. This value is found by means of a full search or by estimation. Such estimation was first given in as M=log2 (log e (2)E(|e[n]|)) , (3) where E()is the mathematical expectation function (average). Summary of Compression Algorithm Characteristics The characteristics of the compression algorithms of this study are summarized in this section. Table 3. Examples of rice codes for m=3 Number Sign bit M Lower bits Number Of 0 s Full Code 0 0 000 0 00001 18 0 010 2 0010001 -12 1 100 1 110001 AudioPaK (Version 1.1): This coder is based on integer FIR predictor. DVDStandard: It uses IIR prediction and Huffman coding for the prediction error signal. LTAC (Version 1.01 for DOS): LTAC is the only coder that uses transform coding for decorrelation. The Rice coding scheme is used to pack the transform coefficients and the residual errors. MUSICompress (Version 1.2 for Windows): MUSICompress uses a predictive type computational structure where adaptively varied approximations are subtracted from the original signal to produce an error signal that is losslessly coded. The nature of the approximations is not specified in so it is possible that MUSICompress should be placed in the lossy-coding-based class. MUSICompress uses a block floating-point representation and Huffman coding to code both the approximation and the residual signal. OggSquish (Version 98.9 for Unix): The lossless audio coder that comes with OggSquish version 98.9 uses IIR linear prediction and Huffman coding. OggSquish is not exactly an audio coder, it is a DSP engine that runs any coding program the user wants . For example, OggSquish is capable of running scripts that generate ADPCM, MPEG, TwinVQ, etc. The 98.9 version of OggSquish includes a lossless coder designed by the authors of OggSquish. Here we refer to this coder as OggSquish. Philips: This algorithm designed by Philips uses a tenth-order FIR linear predictor along with the Rice coding scheme. Shorten (Version 2.1 for DOS): Two commands were used: -p 0: shorten -b 1152 -v 2 -t s16 -p 0 FILE.pcm FILEP0.shn, and -p 10: shorten -b 1152 -v 2 -t s16 -p 10 FILE.pcm FILEP10.shn. These two commands define different coders, which differ by their intrachannel decorrelation block. The program option -p0 uses an FIR predictor with integer coefficients while -p 10 uses a tenth-order minimum mean-squared linear FIR predictor. Both coders use the Rice coding scheme to encode the prediction error signal. Sonarc (Version 2.1h for DOS): Sonarc uses FIR linear prediction and Huffman coding. If the default arguments are used, a fixed-point integer version of Schur’s algo-rithm is used to compute the FIR prediction coefficients. If the -x argument is included in the arguments then the prediction coefficients are determined by the “winner” of a contest between autocorrelation, covariance, Schur’s, and Burg’s algorithms. Waveform Archiver (Version 1.1 for DOS): WA uses FIR linear predictors for the intrachannel decorrelation block. The entropy coding block uses Rice coding. WA offers five levels of coding: -c1, -c2, -c3, -c4, and -c5. The compression ratio was found to increase modestly with the level number. The setting -c5 gives the best compression ratios with significant increase in computation time over the setting -c1. Performance Evaluation With the exception of the DVD standard, all the compression algorithms discussed so far either had executable files available for downloading or they had been previously tested on material that was also accessible to us. To compare their performance, a representative set of six audio files was used to test each of the algorithms. All these files have CD format, i.e., 44.1 kHz, stereo, 16 bits, and are briefly described in the following list and in Table 3.  Track 4 (3 min 11 s): Madonna’s “Like a Virgin.”  Track 20 (39 s): Saxophone, low frequency musical signal.  Track 27 (20 s): Castanets, high frequency musical signal.  Track 48 (28 s): Voice quartet.  Track 66 (18 s): Wind ensemble.  Track L=R (18 s): This track was constructed using the left channel of track 66 . The right channel is an exact copy of the left channel. Therefore, the difference between the left and right channel is zero. This track serves as a probe to allow us to determine if a given compression scheme takes advantage of interchannel correlation. Table3. File size and first-order entropy For left and right channels (x[n]) TRACK File Size (bytes) H0 Left channel (bits per sample) H0 Right channel (bits per sample) Track 04 33,715,920 14.33 14.28 Track 20 6,879,600 10.86 11.03 Track 27 3,528,000 9.27 9.26 Track 48 4,939,200 11.59 11.48 Track 66 3,175,200 10.81 10.92 Track L=R 3,175,200 10.83 10.83 Table 3 summarizes the original file size in bytes and the estimated first-order entropy H0 for both left and right channels. The estimate was obtained by computing a histogram with 2 16 bins and normalizing by the number of samples to obtain probability estimates p i . The entropy estimate is then the sum over all bins of pi log 2 pi . Recall that H0 is the entropy of a source where each symbol (sample) is independent and identically distributed. For signals such as digital audio, where there is considerable sample-to-sample correlation, the entropy will be significantly lower than H 0 However, these entropy estimates, which are all less than 16 bits, suggest that some degree of lossless compression should be possible in all cases. Indeed, our results show that compressed files average about 5 bits/sample. Tables 4 group the lossless coders depending on the arithmetic used. Table 4 groups the integer and fixed-point arithmetic coders, while Table 5 groups the floating-point arithmetic coders. AudioPaK is the only coder to use only integer arithmetic. These tables give the compression ratio original file size r = compressed file sizes for the six experimental audio files. To convert these numbers into bits/sample to compare to the entropy estimates, simply divide 16 by r. For example, r =3is equiva-lent to 5.33 bits/sample. At the outset, it is important to note that most of the recordings contain a significant amount of silence. In fact, we found that a compression ratio of about 1.3 may be easily obtained by compressing only these silent segments for tracks 20, 27, 48, and 66. The data in Tables 4 and 5 show several things. First note that the compression ratios obtained by the various state-of-the-art lossless audio coders for the same audio file are very similar. For example, track 20 compression ratios are in the range 2.72 to 3.28, which is equivalent to at most a 1 bit per sample difference. For the other experimental audio files, these differences are at most 0.85 bits per sample for track 27, 1.12 bits for track 48, 0.80 bits for track 66, and 0.72 bits for track 4. Compression ratios associated with track L=R show that only for AudioPaK, MUSICompress, OggSquish, and WA is the compression ratio twice that of track 66, indicating that these coders attempt to take advantage of the dependence between channels in a way that works perfectly if the two channels are perfectly correlated. Finally, these tables show that coder WAwith the -c5 argument consistently gives the best compression ratio for the six experimental audio files, although in most cases it performed only slightly better than other algorithms. This is one of the few coders, which could not provide real-time compression on our 166 MHz MMX entium machine. In general, floating point computations are not desirable for data compression. They generally are slower than integer computations, and floating point formats can vary between computational platforms. The Philips algorithm and the DVDstandard algorithm are the only algorithms in the list for which no executable was available. We have not attempted a comparison with the DVD algorithm since it is designed Table 4. Compression ratios for integer and Fixed-point arithmetic Lossless Audio coders (Frame size set to 1152 samples for integer coders ,and Default arguments used for the Fixed coders). TRACK Integer Arithmetic Fixed-point Arithmetic AudioPak MUSICompress Sonarc Track 04 1.39 1.37 1.42 Track 20 3.12 2.93 3.13 Track 27 2.47 2.46 2.71 Track 48 2.56 2.41 2.65 Track 66 2.46 2.33 2.55 Track L=R 4.93 4.58 2.56 for higher sampling rates and was evaluated on signals that were not available to us. However, the work on which the DVD algorithm is based shows little if any advantage of the IIR predictor over an FIR predictor. In the case of the Philips algorithm, however, the compression ratio was published in (frame size was set to 1024 samples) for compression of all 70 tracks. To allow a relative comparison with the other coders we also com-pressed all 70 tracks using AudioPaK. Table 6 summarizes results of this comparison. The number represents the performance of AudioPaK on the entire Sound Quality Assessment Material SQAM) CD. The results in Table6 must be interpreted with care. Since we were not able to compare the Philips compression algorithm on the individual files of our test set, we cannot draw the conclusion that this algorithm will perform uniformly better than all the other algorithms. The SQAM CD contains many simple sounds such as pure tones, single instruments, and silence which might be more easily compressed by some algorithms and not others. Other results show the Philips algorithm performing similarly to OggSquish on individual files that were unavailable for our study. These earlier individual comparisons showed significantly lower compression ratios than the 3.31 re-ported for the entire SQAM disk. Conclusion of the Survey Our survey of currently available lossless audio coders suggests that the current technology has reached a limit in what can be achieved for lossless compression of audio.Furthermore, in spite of a wide range of approaches and complexities, the different algorithms perform similarly on the same audio signals with generally less than one bit per sample difference in compressed bit rate.Therefore, a coder compressing at this limit with the least number of arithmetic operations would define the best technology for lossless audio compression. That is, the design of a lossless audio coder should now focus on reducing algorithm complexity. With this in mind, we designed a simple, lossless audio coder, which we called AudioPaK. This coder uses only a few integer arithmetic operations on both the coder and the decoder side. The main operations are prediction with integer coefficients and Golomb coding, which are done on a frame basis. As summarized in Tables 4-6, our algorithm performs as well, or even better, than most state-of-the-art lossless coders. Table 6.compression Ratio for Comlete CD Philips Shorten OggSquish AudioPak 3.31 2.33 3.16 2.88 Table 7 Compression Ratios for the Left channel of Experimental Audio Files Using AudioPak with Different Frame Sizes (in Number of Samples). Track 192 576 1152 2304 4608 Track04 1.38 1.39 1.39 1.38 1.36 Track 20 3.02 3.14 3.17 3.17 3.17 Track 27 2.50 2.49 2.46 2.42 2.37 Track 48 2.48 2.55 2.56 2.56 2.56 Track 66 2.42 2.47 2.47 2.46 2.42 AudioPaK: A Low Complexity Lossless Coder AudioPaK is well suited for Internet transmission because of its low instruction complexity and good compression performance. The algorithm is conveniently described in terms of the same structure (Fig. 1) that was used to discuss all the other lossless compression algorithms. AudioPaK differs from the other algorithms in how the redundancy removal and coding are implemented.The details of the algorithm are discussed below. Framing As in the case of all the other lossless audio coders in our study, AudioPaK divides the input audio signal into independent frames. The length of a frame is a coder parameter and may be set at coding time; however, there are good practical reasons for using frame sizes that are multiple of 192, the number of samples carried by a standard interconnection digital interface established by the Audio Engineering Society and the European Broadcasting Union (AES/EBU). It is a serial transmission format for linearly represented digital audio data, which permits transmission of two-channel digital audio information, including both audio and nonaudio data, from one professional audio device to another. Table 7 shows the results for frame lengths ranging from 192 to 4608 in multiples of 192. This data shows that the compression ratios are not very sensitive to the frame length and suggests that a good length for 44.1 kHz, 16-bit audio sampled streams is 1152 samples. Intrachannel Decorrelation Much of the arithmetic in the algorithms de-scribed earlier is in the computation of the prediction error signal, so this area is a prime target for simplification. Therefore, the intrachannel decorrelation operation uses a very simple FIR adaptive prediction method using only integer coefficients. This approximation was first proposed in Shorten and will be discussed in detail here. The prediction method uses the following four simple FIR predictors, each of which has only integer coefficients: X0^[n]=0 X1^[n]=x[n-1] X2^[n]=2x[n-1]-x[n-2] X3^[n]=3x[n-1]-3x[n-2]+x[n-3] These predictors are FIR predictors since they involve linear combinations of a finite number of past samples. However, they also have an interesting interpretation in terms of the pth-order polynomials that they define. Recall that a (p-1) -order polynomial can be found that passes through the previous p data points x[n-1], x[n-2],..., x[n-p] This polynomial can be evaluated at the nth sample time to obtain the predicted value x[n] These polynomials and the predicted values that they produce are illustrated in Fig. 6 for a typical set of previous samples. An interesting property of these polynomial approximations is that the resulting residual signals, ep[n]= x[n]-xp^[n], can be efficiently computed in the following recursive manner: e0[n]=x[n] e1[n]=e0[n]-e0[n-1] e2[n]=e1[n]-e1[n-1] e3[n]=e2[n]-e2[n-1] This permits the entire set of prediction error signals to be computed without any multplications.This algorithm can be parallelized to take advantage of Intel’s MMXSIMD architecture.For each frame, the four prediction residuals e0[n],e1[n],e2[n],and e3[n]are computed at every sample in the frame, and the absolute values of these residuals are averaged (summed) over the complete frame. The residual with the smallest sum of magnitudes over all samples in the frame is then defined as the best approximation for that frame, and that predictor is used for that frame. The information on which predictor was used can be coded with two bits of information, and that becomes part of the verhead information for the frame. Table 8 summarizes how many times each predictor was used in coding the experimental audio tracks. As seen in Table 8, there is no strong preference for any of the predictors; i.e., there is an advantage in varying the predictor from frame-to-frame. More light is shed on these predictors by considering their frequency responses. It is easily shown that the z-transform system function of the pth-order predictor is Ap(z)=(1-z^-1)p and the magnitude of the frequency response of the pth-order predictor is |Ap (exp(jwT))|=(|2sin(wt/2)|)p .... (4) where T is the sampling rate. Fig. 7 shows plots of (4) for the four predictors used in AudioPaK. (The frequency axis in this plot is normalized so that half the sampling rate is at normalized frequency one.)Note that, because of the pth-order zero at z=.-1, the predictors for p =1,2,3 all have a high 8. Variable bit rate nature of coder frequency boost that increases with increasing p. Specifically, the gain at frequency 22.05 kHz is 0, 20, 40, and 60 dB. Thus, the choice of predictor can also be thought of in terms of selecting the right amount of high frequency boost to flatten the overall spectrum of the prediction error. The pth-order pole at z =-1 would be problematic in many applications of linear predictive modeling because the corresponding reconstruction system of Fig. 4 would be unstable. In the lossless case, however, if we start the reconstruction system with p exact samples from the previous frame, there are no quantization errors to accumulate, and the reconstruction is exact. The question naturally arises as to how much performance is sacrificed by using simple predictors. The evaluations of the Shorten algorithm suggest that low-order predictors with integer coefficients does not degrade performance appreciably. Indeed, as Table 5 shows, there is little difference in performance between the simple integer predictors and the optimum tenth-or-der linear predictor, and, in fact, in some cases the high-order predictor performs slightly worse. Table 9.First-Order Entropy H0 in Bits per Sample For Left and Right Error Channels (e[n]). Track Left Error Channel Right Error Channel Dual Mode Stereo Mode Track 04 12.10 12.27 12.09 Track 20 6.03 6.17 6.17 Track 27 7.58 7.52 7.52 Track 48 7.18 7.16 7.16 Track 66 7.82 7.90 7.87 Interchannel Decorrelation AudioPaK can take advantage of interchannel dependencies present in stereo channel audio streams. This is done when the coder is set in stereo mode. In this mode, in addition to approximating the right stereo channel samples, we simply compute the polynomial approximation of the difference between the left and right channel along with the associated residuals, and the smallest value among the eight sums of absolute residuals (four from the right channel and four from the difference) defines the best approximation. Table 9 presents the estimated first-order entropy H 0 for the left error channel, the right error channel in dual mode, and the righterror channel in stereo mode. The dual channel mode compresses the left and right channels separately. These measures show very little improvement in compression when the interchannel decorrelation is switched on. This is not surprising since the interchannel decorrelation operation does not take into account time delays and other differences that are common between channels of a stereo signal. Because we seek a coder with minimum algorithm complexity, this suggests using AudioPaK in dual mode. Entropy Coding Silent frames can easily be detected with residuals e0[n],e1[n]and efficiently entropy coded with an escape code. If the silent frame is made of all 0 value samples then the sum of |e[n]| is zero and if the silent frame is made of values other than 0 (-1 or 1 as sometimes found) then the sum of |e[n]|is zero (recall that e1[n]=x[n]-x[n-1]). When the frame is not of constant value, we use Golomb coding along with a mapping to reorder the integer numbers. This code is used in the new JPEG-LS lossless image compression scheme . Golomb Coding: Golomb codes are defined to be optimal for exponentially decaying probability distribu-tions of positive integers . Given a unique parameter m, the Golomb code divides a positive integer n into two parts: a binary representation of (n mod m) and a unary representation of ([n/m]). If m=2k , then the code word for n consists of the k least significant bits of n, followed by the number formed by the remaining most significant bits of n in unary repre-sentation and a stop bit. The length of this code word is k+1+[n/2k]. Because the residuals ei[n]are not all positive, a mapping M( )is done to reorder the values as positive integers M(ei[n])={2ei[n] if ei[n]>=0 {2|ei[n]-1 otherwise .....(5) An estimate for the parameter k is given in and is used in AudioPaK. It is based on the expectation E(|ei[n]|) already computed in the intrachannel decorrelation block and is K=[log2(E(|ei[n]|))] ..... (6) The parameter k is defined to be constant over an entire frame and takes values between 0 and(b-1) in the case of b bit audio samples. This parameter can be estimated efficiently without any floating-point operations as follows: for (k = 0, N = framesize; N


Post a Comment

Note: only a member of this blog may post a comment.


© 2011 Study Unique - Designed by Mukund | ToS | Privacy Policy | Sitemap

About Us | Contact Us | Write For Us