13 1 ADPCM in Speech Coding

Chapter 13 Basic Chapter 13 Basic Audio Audio Compression Compression Techniques Techniques

13.1 ADPCM in Speech Coding 13 1 ADPCM in Speech Coding

13.2 G.726 ADPCM

13.3 Vocoders

13.4 Further Exploration 13 4 Further Exploration

13.1 ADPCM in Speech Coding 13 1 ADPCM in Speech Coding 13 1 ADPCM in Speech Coding y y

13.1 ADPCM in Speech Coding

ADPCM ADPCM forms the heart of the ITU s forms the heart of the ITU's speech compression standards G.721, G.723, G.726, and G.727. y

The di erence between these standards

involves the bit-rate (from 3 to 5 bits per sample) and sample) and some algorithm details. some algorithm details y

The default input is µ-law coded PCM 16- bit samples.

13.2 G.726 ADPCM 13 2 G 726 ADPCM y y

13.2 G.726 ADPCM

13 2 G 726 ADPCM

ITU G 726 supersedes ITU standards G.721 and G.723. y

ITU G.726 supersedes ITU standards

Rationale: works by adapting a ﬁxed quantizer q in a simple way. The di erent p y

sizes of codewords used amount to bit-

rates of rates of 16 kbps, 24 kbps, 32 kbps, or 40 16 kbps 24 kbps 32 kbps or 40 kbps , at 8 kHz sampling rate. y y multiplier constant multiplier constant

The standard deﬁnes a The standard deﬁnes a α that will change for every di erence

value, e , depending on the current scale n g g of signals. Deﬁne a scaled di erence signal g as follows: n y The input value is a ratio of a di erence with the ith th factor f t

α. α y

By changing , the quantizer can adapt to α change in the range of the di erence change in the range of the di erence signal — a backward adaptive quantizer. Backward Adaptive Quantizer Backward Adaptive Quantizer Backward Adaptive Quantizer Backward Adaptive Quantizer y

Backward adaptive works in principle by noticing Backward adaptive works in principle by noticing either of the cases:

too many values are quantized to values far from

◦ zero – would happen if ld h if quantizer step size i i in f were i f too small. too many values fall close to zero too much of the y

◦ time

x would happen if the quantizer step size were too large.

y y

Jayant quantizer allows one to adapt a backward Jayant quantizer allows one to adapt a backward quantizer step size after receiving just one single output.

Jayant quantizer simply expands the step size if the ◦ quantized input is in the outer levels of the quantizer ,

and and reduces the step size reduces the step size if the if the input is near zero. input is near zero. The Step Size of Jayant Quantizer The Step Size of Jayant Quantizer The Step Size of Jayant Quantizer The Step Size of Jayant Quantizer y

Jayant quantizer assigns Jayant quantizer assigns multiplier values M multiplier values M to to

k k

each level, with values smaller than unity for levels

near zero , and values larger than 1 for the outer

levels. le els y

For signal fn, the quantizer step size ∆ is changed according to the quantized value k, for the according to the quantized value k, for the previous signal value fn

−1, by the simple formula

Since it is the quantized version of the signal that is driving the change, this is indeed a backward i d i i h h hi i i d d b k d adaptive quantizer. G.726 — G.726 — Backward Adaptive Jayant Backward Adaptive Jayant Quantizer Quantizer y

uses ﬁxed quantizer steps based on the uses ﬁxed quantizer steps based on the

logarithm of the input di e divided by

ﬀerence signal, n . The divisor

α α is:

(13.3) (13 3)

β ≡ log2 α β ≡ log2 α y y

When When di di ,

α is divided into: α is divided into:

ﬀerence values are large ﬀerence values are large locked part α – scale factor for small di .

◦ L

ﬀerence values α unlocked part — adapts quickly to larger di

◦ U ﬀerences.

These correspond to log quantities ◦

βL and βU ,so that:

A changes so that it is about unity , for speech, and about zero , for voice-band data.

y The "unlocked" part adapts via the equation p p q y p g y unlocked part p : The locked part is slightly modiﬁed from the y

The G.726 predictor is quite complicated: it uses a linear p q p combination of 6 quantized di ﬀerences and two reconstructed signal values, from the previous 6 signal values fn.

13.3 Vocoders 13 3 Vocoders 13 3 Vocoders y y

13.3 Vocoders

Vocoders – Vocoders voice coders voice coders , which cannot be which cannot be usefully applied when other analog signals, such as modem signals, are in use. g

concerned with modeling speech so that the salient ◦ features are captured in as few bits as possible. use either a model of the speech waveform in time

◦ ( LPC ( Linear Predictive Coding ) vocoding), or ... → break down the signal into break down the signal into frequency components frequency components

◦ ◦

and model these (channel vocoders and formant vocoders).

y Vocoder simulation of the voice is not very good yet. Phase Insensitivity Phase Insensitivity Phase Insensitivity Phase Insensitivity y

A complete reconstituting of speech waveform is A complete reconstituting of speech waveform is really unnecessary, perceptually: all that is needed is for the amount of energy at any time to be

ab about right t ri ht , and the signal will and the si nal ill sound about right. s nd ab t ri ht y

Phase is a shift in the time argument inside a function of time. function of time.

Suppose we strike a piano key, and generate a roughly ◦ sinusoidal sound cos( ωt), with ω =2πf.

Now if we wait su ◦

ﬃcient time to generate a phase shift π/2and then strike another key, with sound cos(2 ( ) g

ωt + π/2), we generate a waveform like the solid line in Fig. 13.3.

This waveform is the sum cos( ◦ ωt)+cos(2ωt + π/2). y y

– If we did not wait before striking the If we did not wait before striking the second note, then our waveform would be cos( ωt)+cos(2ωt). But perceptually, the two notes would sound the same sound, even though in actuality they would be shifted in phase. would be shifted in phase.

Channel Vocoder Channel Vocoder Channel Vocoder Channel Vocoder y

Vocoders can operate at low bit-rates , 1–2 kbps . To Vocoders can operate at low bit rates , 1 2 kbps . To do so, a channel vocoder ﬁrst applies a ﬁlter bank to separate out the di

ﬀerent frequency components: y

Due to Due to Phase Insensitivity Phase Insensitivity (i.e. (i.e. only the energy is only the energy is

important ): ◦ The waveform is "rectiﬁed" to its absolute value. b k derives relative power levels d i l i l l f h for each

◦ The ﬁlter bank Th ﬁl frequency range.

A subband coder would not rectify the signal, and would ◦ use wider frequency bands. id f b d y

A channel vocoder also analyzes the signal to determine the g general pitch p of the speech ( p ( low —

bass , or high — tenor ), and also the excitation of the

speech. y y

A channel vocoder applies a vocal tract transfer A channel vocoder applies a vocal tract transfer model to generate a vector of excitation parameters that describe a model of the sound, and also guesses whether the sound is h h h d i voiced or unvoiced. i d i d Formant Vocoder Formant Formant Vocoder Formant Vocoder Vocoder y

Formants Formants : the : the salient frequency salient frequency components that are components that are present in a sample of speech, as shown in Fig 13.5. y

Rationale: encoding only the most important

frequencies. frequencies Linear Predictive Coding (LPC) Linear Predictive Coding (LPC) Linear Predictive Coding (LPC) Linear Predictive Coding (LPC) y

LPC vocoders LPC vocoders extract salient features extract salient features of speech of speech directly from the waveform, rather than

transforming the signal to the frequency domain

LPC Features: LPC F

uses a time-varying model of vocal tract sound ◦ generated from a given generated from a given excitation excitation

transmits only a set of parameters modeling the

◦

shape and excitation of the vocal tract, not actual

signals or di i l di small bit-rate ll bit t ﬀerences ﬀ y

About "Linear": The speech signal generated by the output the output vocal tract model vocal tract model is calculated as a is calculated as a function of the current speech output plus a

second term linear in previous model coe cients LPC Coding Process LPC Coding Process LPC Coding Process LPC Coding Process y y

LPC LPC starts starts by deciding whether the current by deciding whether the current segment is voiced or unvoiced:

For unvoiced: a wide-band noise generator g is

◦

used to create sample values f(n) that act as input to the vocal tract simulator For For voiced: voiced: a a pulse train generator pulse train generator creates values creates values

◦ ◦

f(n)

Model parameters a p : calculated by using a y g least- ◦ i i squares set of equations that minimize the

di erence between the actual speech and the speech generated by the vocal tract model, speech generated by the vocal tract model excited by the noise or pulse train generators that capture speech parameters LPC Coding Process (cont d) LPC Coding Process (cont d) LPC Coding Process (cont'd) LPC Coding Process (cont'd) y y

If the output values generate s(n), for If the output values generate s(n) for input values f(n), the output depends on p previous output sample values : y

LP coe cients LP can be calculated by b l l d b solving the following minimization problem: LPC Coding Process (cont'd) LPC Coding Process (cont d) LPC Coding Process (cont d) LPC Coding Process (cont'd) y y

By taking the By taking the derivative of a derivative of a and and setting setting i it to zero , we get a set of J equations: y

Letting φ(i, j)= E{s(n − i)s(n − j)}, then: LPC Coding Process (cont d) LPC Coding Process (cont d) LPC Coding Process (cont'd) LPC Coding Process (cont'd) y y

An often-used method to An often used method to calculate LP calculate LP coefficients is the autocorrelation method: LPC Coding Process (cont d) LPC Coding Process (cont d) LPC Coding Process (cont'd) LPC Coding Process (cont'd) y y

Since Since φ(i j) can be deﬁned as φ(i j)= R(|i − φ(i, j) can be deﬁned as φ(i, j) R(|i j|), and when R(0) ≥ 0, the matrix {

φ(i, j)} is p positive symmetric, there exists a fast y scheme to calculate the LP coe cients: LPC Coding Process (cont d) LPC Coding Process (cont d) LPC Coding Process (cont'd) LPC Coding Process (cont'd) y y

Gain G Gain G can be calculated: can be calculated: y

The pitch P of the current speech frame

can be extracted by correlation method, by y g p ﬁnding the index of the peak of: Code Excited Linear Prediction Code Excited Linear Prediction (CELP) (CELP) y

CELP is a more complex family of coders that CELP is a more complex family of coders that attempts to mitigate the lack of quality of the

simple LPC model y

CELP uses a more complex description of the CELP l d f h excitation:

An An entire set (a codebook) entire set (a codebook) of of excitation vectors excitation vectors is is ◦ ◦ matched to the actual speech, and the index of the best match is sent to the receiver The complexity increases the bit-rate to Th l i i h bi 4,800-9,600 4 800 9 600

◦ bps The resulting speech is perceived as being g p p g more

◦ similar and continuous Quality achieved this way is su

◦ ﬃcient for audio conferencing conferencing The Predictors for CELP The Predictors for CELP The Predictors for CELP The Predictors for CELP y y

CELP coders contain two kinds of prediction: CELP coders contain two kinds of prediction: LTP (Long time prediction): try to reduce

◦

d d i h i th b i redundancy in speech signals by ﬁnding the basic l b ﬁ di periodicity or pitch that causes a waveform that more or less repeats more or less repeats

STP (Short Time Prediction): try to eliminate the ◦

redundancy in speech signals by attempting to redundancy in speech signals by attempting to predict the next sample from several previous ones STP captures the formant structure of the ones STP captures the formant structure of the short-term speech spectrum based on only a few samples p Relationship between STP and LTP Relationship between STP and LTP Relationship between STP and LTP Relationship between STP and LTP y y

STP STP captures the captures the formant structure formant structure of the of the

short-term speech spectrum based on only a few samples p y

LTP, following STP, recovers the long-term

correlation in the speech signal that represents the periodicity in speech using h d h

whole frames or subframes (1/4 of a frame)

– LTP is often implemented as "adaptive LTP i ft i l t d " d ti codebook searching"

y y Fig. 13.6 shows the relationship between Fig 13 6 shows the relationship between STP STP and LTP

Adaptive Codebook Searching Adaptive Codebook Searching Adaptive Codebook Searching Adaptive Codebook Searching y y

Rationale: Rationale: ◦

Look in a codebook of waveforms to ﬁnd one that matches the current subframe Codeword: a shifted speech residue segment

◦ indexed by the by the lag τ corresponding to the current speech frame or subframe in the adaptive codebook The g gain corresponding to the codeword is p g

◦ denoted as g Open--Loop Codeword Searching Open Loop Codeword Searching Open Open Loop Codeword Searching Loop Codeword Searching y y

Tries to minimize the Tries to minimize the long-term long term

prediction error but not the perceptual weighted reconstructed speech error, weighted reconstructed speech error LZW Close--Loop Codeword LZW Close Loop Codeword Searching Searching y y

Closed-loop search Closed-loop search is more often used in CELP is more often used in CELP

coders — also called Analysis-By-Synthesis (A-B-S)

speech is reconstructed and perceptual error for speech is reconstructed and perceptual error for that is minimized via an adaptive codebook search, rather than simply considering sum-of-squares

The best candidate in the adaptive codebook is selected to minimize the distortion of locally reconstructed speech

Parameters are found by minimizing a measure of the di erence between the original and the h di b h i i l d h reconstructed speech Hybrid Excitation Vocoder Hybrid Excitation Vocoder Hybrid Excitation Vocoder Hybrid Excitation Vocoder y y

Hybrid Excitation Vocoders are Hybrid Excitation Vocoders are di erent from di erent from

CELP in that they use model-based methods to

introduce multi-model excitation

includes two major types:

MBE (Multi-Band Excitation): ( ) a blockwise codec, in

◦

which a speech analysis is carried out in a speech

frame unit of about 20 msec to 30 msec MELP (M ltib d E it ti MELP (Multiband Excitation Linear Predictive) Li P di ti ) speech h

◦ codec is a new US Federal standard to replace the old LPC-10 (FS1015) standard with the application focus ( ) pp on very low bit rate safety communications

a DoD speech coding standard used mainly in military

◦ applications and satellite communications applications and satellite communications MELP Vocoder MELP Vocoder MELP Vocoder MELP Vocoder y y

MBE utilizes the A B S scheme in parameter MBE utilizes the A-B-S scheme in parameter estimation:

The parameters such as basic frequency, p q y,

◦

spectrum envelope, and sub-band U/V decisions are all done via closed-loop searching The criterion of the closed-loop optimization is The criterion of the closed loop optimization is

◦ ◦

based on minimizing the perceptually weighted reconstructed speech error, which can be represented in frequency domain as MBE Vocoder MBE Vocoder MBE Vocoder MBE Vocoder y y

MELP: also based on LPC analysis, uses a MELP: also based on LPC analysis uses a multiband soft-decision model for the excitation signal g

The LP residue is bandpassed and a voicing strength parameter is estimated for each band

Speech can be then reconstructed by passing the excitation through the LPC synthesis ﬁlter

Di erently from MBE, MELP divides the excitation into ﬁve ﬁxed bands of 0-500, 500- 1000, 1000-2000, 2000-3000, and 3000-4000 Hz 1000 1000 2000 2000 3000 d 3000 4000 H MELP Vocoder MELP MELP Vocoder MELP Vocoder (Cont d) Vocoder (Cont'd) (Cont d) (Cont'd) y y

A voice degree parameter is estimated in each A voice degree parameter is estimated in each band based on the normalized correlation function of the speech signal and the smoothed p g rectiﬁed signal in the non-DC band

Let s (n) denote the speech signal in band k, u (n)

k k

denote the DC-removed smoothed rectiﬁed signal of s (n). The correlation function:

k MELP Vocoder (Cont'd) MELP Vocoder (Cont d) MELP Vocoder (Cont d) MELP Vocoder (Cont'd) y y

MELP adopts a jittery voiced state to MELP adopts a jittery voiced state to simulate the marginal voiced speech g y p g segments – indicated by an aperiodic ﬂag y

The jittery state is determined by the peakiness of the full-wave rectiﬁed LP residue e(n): d ( ) y

If peakiness is greater than some threshold,

the speech frame is then ﬂagged as jittered

13.4 Further Exploration 13 4 Further Exploration 13 4 Further Exploration Link to Further Exploration for Chapter 13.

→ Link to Further Exploration for Chapter 13.

A comprehensive introduction to speech coding may be found in Spanias's excellent article in the textbook f S ' references.

y y

Sun Microsystems Inc has made available the code Sun Microsystems, Inc. has made available the code for its implementation on standards G.711, G.721, and G.723, in C. The code can be found from the link in the Chapter 13 ﬁle in the Further Exploration in the Chapter 13 ﬁle in the Further Exploration section of the text website

The ITU sells standards; it is linked to from the text website for this Chapter

More information on speech coding may be found in the speech FAQ ﬁles, given as links on the website; the speech FAQ ﬁles given as links on the website; also, links to where LPC and CELP code may be

13 1 ADPCM in Speech Coding

Chapter 13 Basic Chapter 13 Basic Audio Audio Compression Compression Techniques Techniques

Dokumen yang terkait

Analysis of Speech Errors in “Talk Indonesia” Program on Metro TV

An Error Analysis in Changing Direct Into Indirect Speech

An Analysis of Speech Act Types in George OrwellÃ¢ÂÂs Animal Farm

An Analysis of Speech Act Types in George OrwellÃ¢ÂÂs Animal Farm

TEMA 5 UNTUK KELAS 1 K 13

A Novel HDL Coding Style for Power Reduction in FPGAs

Genetic Polymorphisms of the Coding Region (Exon 6) of Calpastatin in Indonesian Sheep

PPT UEU Patofisiologi 1 Pertemuan 13

An Analysis of Speech Act Used in London Has Fallen Movie

Directive Speech Acts in The Movie “The Message” By Moustapha Akkad

Dukungan

Links

13 1 ADPCM in Speech Coding

Chapter 13 Basic Chapter 13 Basic Audio Audio Compression Compression Techniques Techniques

Dokumen yang terkait

Analysis of Speech Errors in “Talk Indonesia” Program on Metro TV

An Error Analysis in Changing Direct Into Indirect Speech

An Analysis of Speech Act Types in George OrwellÃ¢ÂÂs Animal Farm

An Analysis of Speech Act Types in George OrwellÃ¢ÂÂs Animal Farm

TEMA 5 UNTUK KELAS 1 K 13

A Novel HDL Coding Style for Power Reduction in FPGAs

Genetic Polymorphisms of the Coding Region (Exon 6) of Calpastatin in Indonesian Sheep

PPT UEU Patofisiologi 1 Pertemuan 13

An Analysis of Speech Act Used in London Has Fallen Movie

Directive Speech Acts in The Movie “The Message” By Moustapha Akkad

Dokumen yang Anda mencari sudah siap untuk unduhkan

An Analysis of Speech Act Types in George OrwellÃ¢ÂÂs Animal Farm

An Analysis of Speech Act Types in George OrwellÃ¢ÂÂs Animal Farm