Abstract
This paper provides an algorithm which has an unbelievable compression ratio for digital audio(Lossless).This is more advantages than Lossy compression such as MPEG or MP3. This paper contains a powerful lossless compression algorithm named “AudioPak” ,which is for highfidelity mixing in professional environment. This algorithm will be very useful in Internet applications for quick downloading. It’s compression ratio is higher than the PkZip which is also an lossless compression algorithm mostly used. AudioPak is designed using the basic principles of framing ,prediction, and Entropy coding and it is implemented using only a small number of integer arithmetic operations in both Coder and Decoder side. Simulation for this algorithm is also presented in this paper. Our algorithm performs as well , or better, than most stateofart lossless coders.
Keywords:
AudioPak, Entropy coding (Ricecoding
Method),Golomb,coding.
INTRODUCTION
While much work has been done in the area
of lossy compression of speech and audio
signals, relatively little has been published on lossless compression of these signals. Although lossless audio compression is not
likely to become a dominating technology, it may become a useful complement to lossy compression algorithms in some applications. This is because, as we will see, lossless compression algorithms rarely obtain a compression ratio larger than 3:1, while lossy compression algorithms allow compression ratios to range upto 12:1 and higher. For lossy algorithms, the higher the
compression ratio becomes, the lower the resulting final audio quality, but when the
lowest possible data rate is required, lossy techniques are the only alternative.
However, lossless audio coding of stereo CD quality digital audio signals sampled at 44.1 kHz and quantized to 16 bits could become an essential technology for digital music distribution over the Internet because some consumers will want to acquire the
best possible quality of an audio recording for their highfidelity stereo system. Lossy audio compression technologies such as MPEG or MP3 may not be acceptable for this application .Internet resources are limited and will likely remain constrained because the demand for service continues to increase at least as fast as the rate of growth of transmission capacity. Therefore, compression will continue to be important for Internet transmission of audio signals. It is reasonable to suggest, for example, that music distribution applications will offer highly compressed audio clips to the consumer for browsing and selection. After selection, the consumer who is accustomed to audio CD quality may require access to a losslessly compressed copy of the original a copy without any distortion due to the compression algorithm. Music distribution over the Internet is not the only application
for which lossless audio compression technology is of interest. This technology can also be used for the archiving and mixing of highfidelity recordings in professional environments. In this case, lossless compression avoids the degradations incurred in multiple encoding and decoding when using lossy compression coders. This problem arises particularly in the post production industry and occurs when the signals are modified or when different coding formats are used. Lossless audio coding is also used in the new DVD standard for storage of audio signals at higher resolution and sampling rates . The first section of this article is a survey and a classification of the current stateoftheart lossless audio compression algorithms (lossless coders). This study finds that lossless audio coders have reached a limit in what can be achieved for lossless compression of audio. The second section is a description of a new lossless audio coder called AudioPaK, which has low algorithmic complexity and performs as well or even better than most of the lossless audio coders that have been described in the literature.
StateoftheArt for Lossless Audio Coders
Lossless compression is not a new problem in the general context of computing. A variety of utilities such as PkZip are available and widely used to compress text and program files for storage or transmission over the Internet. These programs are designed to be very effective for text data. For example, a PostScript version of this article occupied 4,560,999 bytes of storage while the PkZipped version occupied only 696,041 bytes of storage— a compression ratio of 4,560,999/696,041 = 6.55. On the other hand, track 04, one of the audio files consisting of 16bit samples used in our tests, occupied 33,715,964 bytes of storage before compression and 31,573,915 bytes after compression with PkZip. In this case, the compression ratio was only 1.07. What is needed are algorithms specifically designed for digital audio files. Although lossless coding of audio has not received a great deal of attention so far, a number of different algorithms have been developed, and code for testing these algorithms has been made available in almost all cases.
The following algorithms are used in lossless compression :
AudioPak
DVD
LTAC
MUSICompress
OggSquish
Philips
Shorten
Sonarc
WA
This list represents all the lossless coders that we found in the literature or through searching the Web. We believe that these coders represent the stateoftheart in
lossless audio compression. Shorten, MUSICompress, LTAC , DVD and Philips’ algorithms are described in the literature. Information concerning the other coders—Sonarc , OggSquish v98.9 , and WA comes from personal communications with the designers of these algorithms. AudioPaK is a new lossless audio coder described later in this paper.
Basic Principles of Lossless Compression
The block diagram representation of the operations involved in compressing a single audio channel. All of the techniques studied are based on the principle of first removing redundancy from the signal and then coding the resulting signal with an efficient digital coding scheme.
Each of the steps shown in block diagram will be discussed in general terms and in terms of specific coders. Diagram shows the operations for a single channel of lossless audio compression. Although the two channels of a stereo recording are generally not independent, the dependence is often weak and difficult to take into account. Therefore, the multichannel case of lossless compression has not received much attention in the literature. Generally, stereo channels are compressed separately without attempting to take advantage of the correlation that might exist between the left and right channels or by simply coding the left channel and the difference between the two channels.
Framing
The framing operation is introduced to provide for editibility, an important and necessary property for most applications dealing with digital audio. It is often
important to quickly and simply edit a compressed audio bit stream, and the sheer volume of data prohibits repetitive
decompression of the entire signal preceding the region to be edited. Therefore, a practical solution is to divide the digital audio signal into independent frames of equal time duration. This duration should not be too short, since significant overhead may result from the header that is prefixed to each frame. This header is important, since it defines the compression algorithm parameters, which can change on a frametoframe basis to follow the varying characteristics of the input signal over time. Furthermore, depending on the application, this header may include additional data, such as multimedia and synchronization information. On the other hand, the frame duration should not be too long, since this would limit the temporal adaptivity and would make editing of the compressed audio bit stream more difficult. The algorithms surveyed compromised with a 13 to 26 ms frame duration. This time interval translates to 576 and 1152 samples for a sampling rate of 44.1 kHz.
Intrachannel Decorrelation
The purpose of the second stage of the typical lossless audio coder in Fig. 1 is to remove redundancy by decorrelating the samples within a frame. The lossless audio
coders in Table 1 use two basic approaches for intrachannel decorrelation. Most algorithms remove redundancy by some type of modified linear predictive modeling of the signal. In this approach, a linear predictor is applied to the signal samples in each block resulting in a sequence of prediction error samples. The predictor
parameters, therefore, represent the redundancy that is removed from the signal and the losslessly coded predictor parameters and prediction error together represent the signal in each block. A second, less common, approach is to obtain a low bitrate quantized or lossy representation of the signal, and then losslessly compress the difference between the lossy version and the original signal. These approaches are discussed in more detail below.
Predictive Modeling: The basic principle of predictive modeling is to predict the value of a sample x[n]using the preceding samples x [n 1] , x [n –2] , etc. Fig. 2 is a diagram of intrachannel decorrelation using the predictive modeling scheme. If the predictor performs well, the prediction error e[n]will be uncorrelated and, therefore, have a flat frequency spectrum. Likewise, e[n]will on average be smaller than x[n], and it will, therefore, require fewer bits for its exact digital representation. The prediction models that are used in lossless compression of audio are based on the structure of Fig. 3. This diagram depicts
M N
e[n]=x[n] Q{ ∑ a^ k x[nk] ∑ b^ k x[nk]}
k=1 k=1
where Q{}denotes quantization to the same word length as the original signal x[n]. In the diagram, A(z)and B(z) are ztransform polynomials, with quantized coefficients, representing the feedforward and feedback.
M N
∑ a^ k x[nk] , ∑ b^ k x[nk]
k=1 k=1
x[n] + e[n]
+

2. Prediction model for intrachannel decorrelation block
3. General structure for prediction
terms in (1), respectively. If the polynomial
B^(z)=0 , the feedback terms are omitted, and the prediction error filter has finite impulse response (FIR), neglecting the quantizer Q for the moment. Thus, a predictor with only A(z)is an FIR predictor. If the feedback terms are present (B^(z)!=0), the predictor is an infinite impulse response (IIR) predictor.
The quantization operation in Fig. 3 makes the predictor a nonlinear predictor, but, since the quantization is to 16bit precision, it is reasonable to neglect the quantization in understanding the firstorder effects of the predictor and in developing methods for estimating the predictor parameters. The quantization is necessary for lossless compression since we want to be able to reconstruct x [n] exactly from e[n] remotely and possibly on a different machine architecture. Therefore, e[n]is usually represented with the same fixedpoint integer quantization scheme as
x[n], so as not to introduce new quantization levels below the least significant bit. Reconstructing x[n] from e[n] can be done by simply solving (1) for x[n]in terms of e[n], which leads to
M N
e[n]=x[n]+ Q{ ∑ a^ k x[nk] ∑ b^ k x[nk]}
k=1 k=1
which is depicted in Fig. 4.
Linear predictors are widely used in speech and audio processing . In most applications, an FIR predictor is used, and the coefficients of the prediction filter
4. General structure for reconstruction
5. Details of the intrachannel decorrelation .
Both c^[k] & e[n] are entropy coded and transmitted .Q stands for integer function.
A^[z] are determined to minimize the meansquared prediction error. Without the quantizer and with B^[z]=0 in Fig. 3, the prediction coefficients can be found for the
FIR case by simply solving a set of linear equations. When FIR linear prediction is used in lossless audio compression, the prediction coefficients can be found by standard methods and then quantized for use in Fig. 3 (with B^[z]=0). As shown in Fig. 4, the same coefficients are used in the reconstruction of x[n]from e[n]. Therefore,
the prediction coefficients must be quantized and encoded as part of the lossless representation. Generally, a new set of coefficients is determined for each frame of the signal thereby adapting the predictor to the changing local structure of the signal. The method of determination of the predictor coefficients ranges from estimation of the minimum meansquared predictor for each block to the simple choice of a predictor from a small library of fixed predictors.
In many of the lossless audio compression algorithms in our survey, the predictor for a given block is chosen
from a finite library of coefficient sets to avoid the computation required to estimate the optimum predictor coefficients. In these cases, the identity of the predictor that was
used for a given frame is all that need be encoded with the compressed prediction error. This is the approach that is followed, for example, in the case where both
feedforward and feedback terms are involved in the prediction. This is because the general solution for the minimum meansquared predictor is much more complicated in the IIR case. For this reason, IIR prediction has not been widely used in other applications, such as speech processing. In lossless compression, have shown that, while IIR prediction has the potential to perform better because it can match a wider range of spectral shapes than FIR prediction with the same number of coefficients, the improvement in compression performance demonstrated so far is modest at best. In general, a minimum meansquared predictor for audio signals will involve fractional coefficients. These must be quantized and coded as part of the lossless representation, and the predictor must be implemented with fixedpoint arithmetic. However, when the prediction coefficients are integers, it is particularly simple to compute the prediction error with integer arithmetic that can easily be done exactly the same way on different computation platforms.
The following list shows the algorithms
that are based on predictive modeling
according to the type of prediction that is used:
FIR model :
Shorten, Sonarc, WA Philips
IIR model :
Oggsquish,DVD.
Integer coefficient FIR model (fixed or adaptive):
Shorten, HEAR, MUSICompress
WA, AudioPak
The details of how the predictor model is chosen at each frame can be found in each of the above references.
Lossy Coding Model: The second approach to lossless compression is based on obtaining an approximation to x[n] by some sort of lossy coding scheme that maintains
the signal wave shape with no time shift. Although this approach has some potentially attractive features, only one algorithm that clearly used this approach was found. proposed a method based on transform coding of the audio signal. The basic idea of this method is presented in Fig. 5. This algorithm uses an orthonormal transform, a discrete cosine transform, to obtain a quantized version of the signal. The cosine transform and its inverse are indicated by the matrix multiplications by A and AT in Fig. 5. The use of the cosine transform is motivated by the wellknown energy compaction property of the cosine transform, which permits a good reconstruction from only a few quantized values of the cosine transform. The input signal x[n]is transformed and quantized with an unitary quantization stepsize Δ=1. The resulting coefficients c^[k] are then entropy coded. Because the inverse transformation followed by a dequantization step results in an approximation x[n] of the original signal x[n], the decompression steps are duplicated at the coder side to compute the residual error e[n]=x[n]x^[n] This error is entropy coded and transmitted along with the coded coefficients c^[ k].At the decoder, the original signal can be obtained from x[n]=x^[n]+e[n], provided that the implementation of the inverse cosine transformation and quantization at the
decoder is identical to that at the coder.
A potential advantage of this and other lossycoding based approaches is that the lossless representation contains within it a lossy lower data rate representation of
the audio signal, which could serve as a “thumbnail sketch” of the signal. MUSICompress also is structured to provide for a lossy compression mode of operation. This approach could be useful in situations where progressive transmission of the signal is desired or where audio signals are broadcast over the Internet to destinations with varying computational resource and data rate .
Entropy Coding
Entropy coding removes redundancy from the residual signal e[n] (and the discrete cosine transform (DCT) coefficients in the transformbased method). In this process, no information is lost. The following three methods are used in the algorithms in Table 1: Huffman, run length,
and Rice coding. The first two methods are well known. On the other hand, Rice coding, which is used in some of the coders, is less well known.
The Rice code is characterized by only one parameter, denoted henceforth as m. In fact, this code is the Huffman code for a Laplacian probability density function, which is found to be a good approximation for the distribution of the prediction residual samples e[n]for all the intrachannel decorrelation operations discussed above.
To form this code, a number is divided into three parts; a sign bit, the m loworder bits, and the remaining highorder bits. The first part of a code word is a single bit indicating the sign of e[n]. The second part consists of the m leastsignificant bits of the binary representation of the absolute value of e[n]. The third part of the code word is made of N consecutive zeroes, where N has the same binary representation as the yet unused most significant bits of the absolute value of e[n]. Finally, to end this series of zeroes, a bit set to 1 is inserted. Table 2 gives examples of Rice codes for m =3.
All the lossless audio coders presented in the literature that use the Rice code Lossless Transform Audio Compression (LTAC) , Philips , Shorten , WA define the single parameter m to be of constant value over an entire frame. This value is found by means of a full search or by estimation. Such estimation was first given in as
M=log2 (log e (2)E(e[n])) , (3)
where E()is the mathematical expectation function (average).
Summary of Compression Algorithm Characteristics
The characteristics of the compression algorithms of this study are summarized in this section.
Table 3. Examples of rice codes for m=3
Number Sign
bit M
Lower
bits Number
Of 0 s Full
Code
0 0 000 0 00001
18 0 010 2 0010001
12 1 100 1 110001
AudioPaK (Version 1.1): This coder is based on integer FIR predictor.
DVDStandard: It uses IIR prediction and Huffman coding for the prediction error signal.
LTAC (Version 1.01 for DOS): LTAC is the only coder that uses transform coding for decorrelation. The Rice coding scheme is used to pack the transform coefficients
and the residual errors.
MUSICompress (Version 1.2 for Windows):
MUSICompress uses a predictive type computational structure where adaptively varied approximations are subtracted from the original signal to produce an error
signal that is losslessly coded. The nature of the approximations is not specified in so it is possible that MUSICompress should be placed in the lossycodingbased class. MUSICompress uses a block floatingpoint representation and Huffman coding to code
both the approximation and the residual signal.
OggSquish (Version 98.9 for Unix): The lossless audio coder that comes with OggSquish version 98.9 uses IIR linear prediction and Huffman coding. OggSquish is not exactly an audio coder, it is a DSP engine that runs any coding program the user wants . For example, OggSquish is capable of running scripts that generate ADPCM, MPEG, TwinVQ, etc. The 98.9 version of OggSquish includes a lossless coder designed by the authors of OggSquish. Here we refer to this coder as
OggSquish.
Philips: This algorithm designed by Philips uses a tenthorder FIR linear predictor along with the Rice coding
scheme.
Shorten (Version 2.1 for DOS):
Two commands were used:
p 0: shorten b 1152 v 2 t s16 p 0
FILE.pcm FILEP0.shn, and
p 10: shorten b 1152 v 2 t s16 p 10
FILE.pcm FILEP10.shn.
These two commands define different coders, which differ by their intrachannel decorrelation block. The program option p0 uses an FIR predictor with integer coefficients while p 10 uses a tenthorder minimum meansquared linear FIR predictor. Both coders use the Rice coding scheme to encode the prediction error signal.
Sonarc (Version 2.1h for DOS): Sonarc uses FIR linear prediction and Huffman coding. If the default arguments are used, a fixedpoint integer version of Schur’s algorithm is used to compute the FIR prediction coefficients. If the x argument is included
in the arguments then the prediction coefficients are determined by the “winner” of a contest between autocorrelation, covariance, Schur’s, and Burg’s algorithms.
Waveform Archiver (Version 1.1 for DOS): WA uses FIR linear predictors for the intrachannel decorrelation block. The entropy coding block uses Rice coding. WA offers five levels of coding: c1, c2, c3, c4, and c5. The compression ratio was found to increase modestly with the level number. The setting c5 gives the best compression ratios with significant increase in computation time over the setting c1.
Performance Evaluation
With the exception of the DVD standard, all the compression algorithms discussed so far either had executable files available for downloading or they had been previously
tested on material that was also accessible to us. To compare their performance, a representative set of six audio files was used to test each of the algorithms. All these files have CD format, i.e., 44.1 kHz, stereo, 16 bits, and are briefly described in the following list and in Table 3.
Track 4 (3 min 11 s): Madonna’s “Like a Virgin.”
Track 20 (39 s): Saxophone, low frequency musical signal.
Track 27 (20 s): Castanets, high frequency musical signal.
Track 48 (28 s): Voice quartet.
Track 66 (18 s): Wind ensemble.
Track L=R (18 s): This track was constructed using the left channel of track 66 . The right channel is an exact copy of the left channel.
Therefore, the difference between the left
and right channel is zero. This track serves
as a probe to allow us to determine if a
given compression scheme takes advantage
of interchannel correlation.
Table3. File size and firstorder entropy
For left and right channels (x[n])
TRACK File Size
(bytes) H0 Left channel
(bits per sample) H0 Right channel
(bits per sample)
Track 04 33,715,920 14.33 14.28
Track 20 6,879,600 10.86 11.03
Track 27 3,528,000 9.27 9.26
Track 48 4,939,200 11.59 11.48
Track 66 3,175,200 10.81 10.92
Track L=R 3,175,200 10.83 10.83
Table 3 summarizes the original file size in bytes and the estimated firstorder entropy H0 for both left and right channels. The estimate was obtained by computing a histogram with 2 16 bins and normalizing by the number of samples to obtain probability estimates p i . The entropy estimate is then the sum over all bins of pi log 2 pi . Recall that H0 is the entropy of a source where each symbol (sample) is independent and identically distributed. For signals such as digital audio, where there is considerable sampletosample correlation, the entropy
will be significantly lower than H 0 However, these entropy estimates, which are all less than 16 bits, suggest that some degree of lossless compression should be possible in all cases. Indeed, our results show that compressed files average about 5 bits/sample.
Tables 4 group the lossless coders depending on the arithmetic used. Table 4 groups the integer and fixedpoint arithmetic coders, while Table 5 groups the floatingpoint arithmetic coders. AudioPaK is the only coder to use only integer arithmetic. These tables give the compression ratio
original file size
r =
compressed file sizes
for the six experimental audio files. To convert these numbers into bits/sample to compare to the entropy estimates, simply divide 16 by r. For example, r =3is equivalent to 5.33 bits/sample.
At the outset, it is important to note that most of the recordings contain a significant amount of silence. In fact, we found that a compression ratio of about
1.3 may be easily obtained by compressing only these silent segments for tracks 20, 27, 48, and 66.
The data in Tables 4 and 5 show several things. First note that the compression ratios obtained by the various
stateoftheart lossless audio coders for the same audio file are very similar. For example, track 20 compression ratios are in the range 2.72 to 3.28, which is equivalent to at most a 1 bit per sample difference. For the other experimental audio files, these differences are at most 0.85 bits per sample for track 27, 1.12 bits for track 48, 0.80 bits
for track 66, and 0.72 bits for track 4.
Compression ratios associated with track L=R show that only for AudioPaK, MUSICompress, OggSquish, and WA is the compression ratio twice that of track 66, indicating that these coders attempt to take advantage of the dependence between channels in a way that works perfectly if the two channels are perfectly correlated.
Finally, these tables show that coder WAwith the c5 argument consistently gives the best compression ratio for the six experimental audio files, although in most cases it performed only slightly better than other algorithms. This is one of the few coders, which could not provide realtime compression on our 166 MHz MMX entium
machine. In general, floating point computations are not desirable for data compression. They generally are slower
than integer computations, and floating point formats can vary between computational platforms. The Philips algorithm and the DVDstandard algorithm are the only algorithms in the list for which no executable was available. We have not attempted a comparison with the DVD algorithm since it is designed
Table 4. Compression ratios for integer and
Fixedpoint arithmetic Lossless Audio coders
(Frame size set to 1152 samples for integer coders ,and Default arguments used for the Fixed coders).
TRACK Integer
Arithmetic Fixedpoint Arithmetic
AudioPak MUSICompress Sonarc
Track 04 1.39 1.37 1.42
Track 20 3.12 2.93 3.13
Track 27 2.47 2.46 2.71
Track 48 2.56 2.41 2.65
Track 66 2.46 2.33 2.55
Track L=R 4.93 4.58 2.56
for higher sampling rates and was evaluated on signals that were not available to us. However, the work on which the DVD algorithm is based shows little if any
advantage of the IIR predictor over an FIR
predictor. In the case of the Philips algorithm, however, the compression
ratio was published in (frame size was set to 1024 samples) for compression of all
70 tracks. To allow a relative comparison with the other coders we also compressed
all 70 tracks using AudioPaK. Table 6 summarizes results of this comparison. The number represents the performance of AudioPaK on the entire Sound Quality Assessment Material SQAM) CD. The results in Table6 must be interpreted with care. Since we were not able to compare the Philips compression algorithm on the individual files of our test set, we cannot draw the conclusion that this algorithm will perform uniformly better than all the other algorithms. The SQAM CD contains many simple sounds such as pure tones, single instruments, and silence which might be more easily compressed by some algorithms and not others. Other results show the Philips algorithm performing similarly to OggSquish on individual files that were unavailable for our study. These earlier individual comparisons showed significantly lower compression ratios than the 3.31 reported for the entire SQAM disk.
Conclusion of the Survey
Our survey of currently available lossless audio coders suggests that the current technology has reached a limit in what can be achieved for lossless compression of audio.Furthermore, in spite of a wide range of approaches and complexities, the different algorithms perform similarly
on the same audio signals with generally less than one bit per sample difference in compressed bit rate.Therefore, a coder compressing at this limit with the least number of arithmetic operations would define the best technology for lossless audio compression. That is, the design of a lossless audio coder should now focus on reducing algorithm complexity. With this in mind, we designed a simple, lossless audio coder, which we called AudioPaK. This coder uses only a few integer arithmetic operations on both the coder and the decoder side. The main operations are prediction with integer coefficients and Golomb coding, which are done on a frame basis. As summarized in Tables 46, our algorithm performs as well, or even better, than most stateoftheart lossless coders.
Table 6.compression Ratio for Comlete CD
Philips Shorten OggSquish AudioPak
3.31 2.33 3.16 2.88
Table 7 Compression Ratios for the Left channel of Experimental Audio Files Using AudioPak with Different Frame Sizes (in Number of Samples).
Track 192 576 1152 2304 4608
Track04 1.38 1.39 1.39 1.38 1.36
Track 20 3.02 3.14 3.17 3.17 3.17
Track 27 2.50 2.49 2.46 2.42 2.37
Track 48 2.48 2.55 2.56 2.56 2.56
Track 66 2.42 2.47 2.47 2.46 2.42
AudioPaK:
A Low Complexity Lossless Coder
AudioPaK is well suited for Internet transmission because of its low instruction complexity and good compression performance. The algorithm is conveniently
described in terms of the same structure
(Fig. 1) that was used to discuss all the other lossless compression algorithms. AudioPaK differs from the other algorithms in how the redundancy removal and coding are implemented.The details of the algorithm are discussed below.
Framing
As in the case of all the other lossless audio coders in our study, AudioPaK divides the input audio signal into independent frames. The length of a frame is a coder parameter
and may be set at coding time; however, there are good practical reasons for using frame sizes that are multiple of 192, the number of samples carried by a standard interconnection digital interface established by the Audio Engineering Society and the European Broadcasting Union (AES/EBU). It is a serial transmission format for linearly represented digital audio data, which permits transmission of twochannel digital audio information, including both audio and nonaudio data, from one professional
audio device to another. Table 7 shows the results for frame lengths ranging from 192 to 4608 in multiples of 192. This data shows that the compression ratios are not very sensitive to the frame length and suggests that a good length for 44.1 kHz, 16bit audio sampled streams is 1152 samples.
Intrachannel Decorrelation
Much of the arithmetic in the algorithms described earlier is in the computation of the prediction error signal, so this area is a prime target for simplification. Therefore, the intrachannel decorrelation operation uses a very simple FIR adaptive prediction method using only integer coefficients. This approximation was first proposed in Shorten and will be discussed in detail here.
The prediction method uses the following four simple FIR predictors, each of which has only integer coefficients:
X0^[n]=0
X1^[n]=x[n1]
X2^[n]=2x[n1]x[n2]
X3^[n]=3x[n1]3x[n2]+x[n3]
These predictors are FIR predictors since they involve linear combinations of a finite number of past samples. However, they also have an interesting interpretation in terms of the pthorder polynomials that they define. Recall that a (p1) order polynomial can be found that passes through the previous p data points x[n1], x[n2],..., x[np] This polynomial can be evaluated at the nth sample time to obtain the predicted value x[n] These polynomials and the predicted values that they produce are illustrated in Fig. 6 for a typical set of previous samples. An interesting property of these polynomial approximations is that the resulting residual signals,
ep[n]= x[n]xp^[n], can be efficiently computed in the following recursive manner:
e0[n]=x[n]
e1[n]=e0[n]e0[n1]
e2[n]=e1[n]e1[n1]
e3[n]=e2[n]e2[n1]
This permits the entire set of prediction error signals to be computed without any multplications.This algorithm can be parallelized to take advantage of Intel’s MMXSIMD architecture.For each frame, the four prediction residuals e0[n],e1[n],e2[n],and e3[n]are computed at every sample in the frame, and the absolute values of these residuals are averaged (summed) over the complete frame. The residual with the smallest sum of magnitudes over all samples in the frame is then defined as the best approximation for that frame, and that predictor is used for that frame. The information on which predictor was used can be coded with two bits of information, and that becomes part of the verhead information for the frame. Table 8 summarizes how many times each predictor was used in coding the experimental audio tracks. As seen in Table 8, there is no strong preference for any of the predictors; i.e., there is an advantage in varying the predictor from frametoframe. More light is shed on these predictors by considering their frequency responses. It is easily shown that the ztransform system function of the pthorder predictor is
Ap(z)=(1z^1)p
and the magnitude of the frequency response of the pthorder predictor is
Ap (exp(jwT))=(2sin(wt/2))p .... (4)
where T is the sampling rate. Fig. 7 shows plots of (4) for the four predictors used in AudioPaK. (The frequency axis in this plot is normalized so that half the sampling
rate is at normalized frequency one.)Note that, because of the pthorder zero at z=.1, the predictors for p =1,2,3 all have a high
8. Variable bit rate nature of coder
frequency boost that increases with increasing p. Specifically, the gain at frequency 22.05 kHz is 0, 20, 40, and 60 dB. Thus, the choice of predictor can also be thought of in terms of selecting the right amount of high frequency boost to flatten the overall spectrum of the prediction error. The pthorder pole at z =1 would be problematic in many applications of linear predictive modeling because the corresponding reconstruction system of Fig. 4 would be unstable. In the lossless case, however, if we start the reconstruction system with p exact samples from the previous frame, there are no quantization errors to accumulate, and the reconstruction is exact. The question naturally arises as to how much performance is sacrificed by using simple predictors. The evaluations of the Shorten algorithm suggest that loworder predictors with integer coefficients does not degrade performance appreciably. Indeed, as Table 5 shows, there is little difference in performance between the simple integer predictors and the optimum tenthorder linear predictor, and, in fact, in some cases the highorder predictor performs slightly worse.
Table 9.FirstOrder Entropy H0 in Bits per Sample
For Left and Right Error Channels (e[n]).
Track Left Error Channel Right Error Channel
Dual Mode Stereo Mode
Track 04 12.10 12.27 12.09
Track 20 6.03 6.17 6.17
Track 27 7.58 7.52 7.52
Track 48 7.18 7.16 7.16
Track 66 7.82 7.90 7.87
Interchannel Decorrelation
AudioPaK can take advantage of interchannel dependencies present in stereo channel audio streams. This is done when the coder is set in stereo mode. In this mode, in addition to approximating the right stereo channel samples, we simply compute the polynomial approximation of the difference between the left and right channel along with the associated residuals, and the smallest value among the eight sums of absolute residuals (four from the right channel and four from the difference) defines the best approximation. Table 9 presents the estimated firstorder entropy H 0 for the left error channel, the right error channel in dual mode, and the righterror channel in stereo mode. The dual channel mode compresses the left and right channels separately. These measures show very little improvement in compression when the interchannel decorrelation is switched on. This is not surprising since the interchannel decorrelation operation does not take into account time delays and other differences that are common between channels of a stereo signal. Because we seek a coder with minimum algorithm complexity, this suggests using AudioPaK in dual mode.
Entropy Coding
Silent frames can easily be detected with residuals e0[n],e1[n]and efficiently entropy coded with an escape code. If the silent frame is made of all 0 value samples then the sum of e[n] is zero and if the silent frame is made of values other than 0
(1 or 1 as sometimes found) then the sum of e[n]is zero (recall that
e1[n]=x[n]x[n1]). When the frame is not of constant value, we use Golomb coding along with a mapping to reorder the integer numbers. This code is used in the new JPEGLS lossless image compression scheme .
Golomb Coding:
Golomb codes are defined to be optimal for exponentially decaying probability distributions of positive integers . Given a unique parameter m, the Golomb code divides a positive integer n into two parts: a binary representation of (n mod m) and a unary
representation of ([n/m]).
If m=2k , then the code word for n consists of the k least significant bits of n, followed by the number formed by the remaining most significant bits of n in unary representation and a stop bit. The length of this code word is
k+1+[n/2k].
Because the residuals ei[n]are not all positive, a mapping M( )is done to reorder the values as positive integers
M(ei[n])={2ei[n] if ei[n]>=0
{2ei[n]1 otherwise .....(5)
An estimate for the parameter k is given in and is used in AudioPaK. It is based on the expectation E(ei[n]) already computed in the intrachannel decorrelation block and is
K=[log2(E(ei[n]))] ..... (6)
The parameter k is defined to be constant over an entire frame and takes values between 0 and(b1) in the case of b bit audio samples. This parameter can be estimated efficiently without any floatingpoint operations as follows:
for (k = 0, N = framesize; N
Tagged as : ECE
0 comments:
Post a Comment
Note: only a member of this blog may post a comment.