on Adaptive Line Enhancer Research Thesis Aviva Atkins April 7 th , - - PowerPoint PPT Presentation

on adaptive line enhancer
SMART_READER_LITE
LIVE PREVIEW

on Adaptive Line Enhancer Research Thesis Aviva Atkins April 7 th , - - PowerPoint PPT Presentation

Speech Enhancement Based on Adaptive Line Enhancer Research Thesis Aviva Atkins April 7 th , 2020 Supervised by Prof. Israel Cohen Outline Introduction The problem researched The challenges Research contributions Adaptive


slide-1
SLIDE 1

Speech Enhancement Based

  • n Adaptive Line Enhancer

Research Thesis

Aviva Atkins

April 7th, 2020 Supervised by Prof. Israel Cohen

slide-2
SLIDE 2

Outline

▪ Introduction ▪ The problem researched ▪ The challenges ▪ Research contributions ▪ Adaptive Line Enhancer background

▪ Convention fixed step size ▪ Mutual Information approach

▪ Proposed method ▪ Conclusions and future research

Sound test

slide-3
SLIDE 3

Noise is Everywhere!

Interference Source Reverberation Echo Additive noise

slide-4
SLIDE 4

Speech Enhancement

slide-5
SLIDE 5

Applications

slide-6
SLIDE 6

Harmonic noise

▪ Contains deterministic sinusoidal components

slide-7
SLIDE 7

The problem researched

Reducing nonstationary harmonic noise from a speech signal recorded with a single microphone

Source Nonstationary Harmonic Additive noise

slide-8
SLIDE 8

The challenges

▪ Single channel – only the noisy signal is available with no access to additional reference signals and no spatial information →only intrinsic properties of speech or noise can be used ▪ The vast majority of methods require an estimate of the noise spectrum

▪ When the noise is stationary it can be estimated during segments when speech is absent ▪ When the noise is nonstationary it needs to be tracked continuously

→it is more difficult to estimate nonstationary noise ▪ Trade-off between noise reduction to speech distortion ▪ The developed method needs to be relevant for real-time applications

slide-9
SLIDE 9

Research Contributions

▪ Introduced a filtering method based on the frequency domain Adaptive Line Enhancer, that enables better reduction of nonstationary harmonic noise.

▪ Proposed the combined filter – a combination of the commonly-used forward adaptive linear filter and a non-causal backward adaptive linear filter used together, increasing the reduction span of the noise transient ▪ Applied the filter based on a comparison to the noisy spectrum, reducing noise

  • verestimation

▪ Applied the filter based on a noise presence indicator for better speech preservation ▪ Employed a set of filter lengths, to ensure the combined filter spans throughout the noise transient

slide-10
SLIDE 10

Additional contributions

▪Investigated a statistical model as an alternative to the Decision Directed for the a-priori SNR estimator and showed that it can eliminate the musical noise while compromising between signal distortion and noise reduction. ▪ Introduced a beamformer that enables fine tuning of the compromise between Directivity Factor and White Noise Gain, through a simple computationally- efficient algorithm.

slide-11
SLIDE 11

Why use Adaptive Line Enhancer?

▪ Exploits the structure of the harmonic noise ▪ Simple with low computational cost ▪ Modifies both magnitude and phase so has the potential to improve on signal intelligibility and not just quality

slide-12
SLIDE 12

Adaptive Noise Canceller (ANC)

+ −

Adaptive filter Signal Source Noise Source Primary Input Output 𝑦 𝑜 + 𝑤(𝑜) 𝑤0(𝑜) ො 𝑤 (𝑜) 𝑓(𝑜) Adaptive algorithm ො 𝑦 (𝑜) ො 𝑦 = 𝑦 + 𝑤 − ො 𝑤 𝑛𝑗𝑜𝐹 ො 𝑦2 = 𝐹 𝑦2 + 𝑛𝑗𝑜𝐹 𝑤 − ො 𝑤 2 𝑛𝑗𝑜𝐹 𝑦 − ො 𝑦 2 = 𝑛𝑗𝑜𝐹 𝑤 − ො 𝑤 2 ො 𝑤 = 𝑤, ො 𝑦 = 𝑦

Ideal case:

+𝑒(𝑜) +𝑒(𝑜) +𝑦0(𝑜)

?

Reference Input

distortion

slide-13
SLIDE 13

Adaptive Line Enhancer (ALE)

+ −

Adaptive filter Signal Source Noise Source Output 𝑧 𝑜 = 𝑦 𝑜 + 𝑤(𝑜) 𝑤(𝑜) 𝑨 (𝑜) 𝑓(𝑜) Adaptive algorithm ො 𝑦 (𝑜) 𝑦(𝑜)

 −

z

Input ො 𝑦 (𝑜) Output’

Signal decorrelated Noise correlated Noise decorrelated signal correlated

slide-14
SLIDE 14

Adaptive Line Enhancer (ALE)

+ −

Adaptive filter Signal Source Noise Source Output 𝑧 𝑜 = 𝑦 𝑜 + 𝑤(𝑜) 𝑤(𝑜) 𝑨 (𝑜) 𝑓(𝑜) Adaptive algorithm ො 𝑦 (𝑜) 𝑦(𝑜)

 −

z

Input X(𝑙, 𝑛) 𝑍 𝑙, 𝑛 = 𝑌 𝑙, 𝑛 + 𝑊 𝑙, 𝑛 𝑊 𝑙, 𝑛 𝑎 𝑙, 𝑛 𝐹 𝑙, 𝑛 ෠ 𝑌 𝑙, 𝑛

TD FD

slide-15
SLIDE 15

෠ 𝑌 𝑙, 𝑛 𝑎 𝑙, 𝑛 𝐹 𝑙, 𝑛 𝑍 𝑙, 𝑛 = 𝑌 𝑙, 𝑛 + 𝑊 𝑙, 𝑛 𝑊 𝑙, 𝑛 X(𝑙, 𝑛)

Adaptive Line Enhancer (ALE)

+ −

Adaptive filter Signal Source Noise Source Output Adaptive algorithm

 −

z

Input 𝜈

( ) ( ) ( )

 − = m k m k m k Z

H

, , , y h

( ) ( ) ( )  

T L

m k H m k H m k , ,..., , ,

1 −

= h

( ) ( ) ( )  

T

L m k Y m k Y m k 1 , ,..., , , + − − − = −    y

( ) ( ) ( ) ( ) ( )

    + − − + = +

2 *

, , , , 1 , m k m k m k E m k m k y y h h NLMS:

FD

𝜐

slide-16
SLIDE 16

Conventional Fixed Step Size Example

For the conventional fixed step size, it is difficult to both reduce the noise and maintain high quality of the enhanced signal

Frame Index Frame Index Frame Index

  • Freq. Index
  • Freq. Index
  • Freq. Index

(a) Clean signal (b) Noisy signal (c) Enhanced signal

3 , 1 = = L 

slide-17
SLIDE 17

Mutual Information Approach

Taghia, J., Martin, R., 2016, “A frequency-domain adaptive line enhancer with step-size control based on mutual information for harmonic noise reduction” IEEE Trans. Audio Speech Lang. Process.

▪ Frequency dependent step size, detecting harmonic noise presence per frequency ▪ Based on Mutual Information (MI) ▪ Step size: ( ) ( )

k Q k   

=

( )

( )

      =

=

else , if , 1

K 1 k 2 thr P

I k I Q

constant 

( ) ( ) ( )

     =

* *

, ˆ , ˆ k k k I k k k I k

total P

slide-18
SLIDE 18

MI Approach Example

Frame Index

  • Freq. [KHz]
  • Freq. Index

(b) Noisy signal (c) MI Step Size

1 =

Q

𝜈

slide-19
SLIDE 19

MI Approach Example

Frame Index Frame Index

  • Freq. Index
  • Freq. Index

(a) Clean signal (c) Enhanced signal - MI Frame Index

  • Freq. Index

(b) Enhanced signal – fixed step size

slide-20
SLIDE 20

MI Approach

▪ Implemented in block-wise manner ▪ Assumption: stationarity of the noise is at least as large as the block length ▪ They take block length of 3 seconds

Taghia, J., Martin, R., 2016, “A frequency-domain adaptive line enhancer with step-size control based on mutual information for harmonic noise reduction” IEEE Trans. Audio Speech Lang. Process.

The assumption does not hold for highly non- stationary signals, such as the heart monitor beeping Decision block often zero for highly non-stationary signals, such as the heart monitor beeping

Spectrogram of 3.4s long heart monitor beeping

slide-21
SLIDE 21

MI Approach Example Non-stationary

Frame Index

  • Freq. [KHz]
  • Freq. Index

(a) Noisy signal (b) MI Step Size

=

Q

𝜈

slide-22
SLIDE 22

MI Approach Example Non-stationary

Frame Index Frame Index

  • Freq. Index
  • Freq. Index

(a) Clean signal (c) Enhanced signal - MI Frame Index

  • Freq. Index

(b) Noisy signal ignored

Q

slide-23
SLIDE 23

?

Non-Stationary noise – filter output estimate

+ −

Adaptive filter Signal Source Noise Source Output 𝑧 𝑜 = 𝑦 𝑜 + 𝑤(𝑜) 𝑤(𝑜) 𝑨 (𝑜) 𝑓(𝑜) Adaptive algorithm 𝑦(𝑜)

 −

z

Input

?

= ො 𝑤(𝑜) ො 𝑦 (𝑜)

slide-24
SLIDE 24

Experimental Setup

▪ Clean speech: 20 different speech signals from different speakers from TIMIT database (0.5M/0.5F) ▪ Sampled @ 16KHz ▪ SNR range [0,20] dB ▪ STFT, overlap-add ▪ Noise: 26 different non-stationary harmonic noise signals, e.g., heart monitor beeping, train door beeping, house alarm, railroad crossing bells.

slide-25
SLIDE 25

Correlation

𝛅𝑌 𝑙, 𝑛, 𝜐 = 𝐹 𝑌 𝑙, 𝑛 𝐲∗ 𝑙, 𝑛 − 𝜐 𝐹 𝑌 𝑙, 𝑛

2

𝛅V 𝑙, 𝑛, 𝜐 = 𝐹 𝑊 𝑙, 𝑛 𝐰∗ 𝑙, 𝑛 − 𝜐 𝐹 𝑊 𝑙, 𝑛

2

[Frames]

= ො 𝑤(𝑜)

1 Frame = 32ms

slide-26
SLIDE 26

Proposed Approach

▪ Combined filter (CMLNLMS): 𝐹𝑑 𝑙, 𝑛 = 𝐹𝑐 𝑙, 𝑛 + 𝑀 , 𝐹𝑐 𝑙, 𝑛 + 𝑀

2 ≤ 𝐹𝑔 𝑙, 𝑛 2 𝑏𝑜𝑒 𝐹𝑐 𝑙, 𝑛 + 𝑀 2 ≤ 𝑍 𝑙, 𝑛 2

𝐹𝑔 𝑙, 𝑛 , 𝐹𝑐 𝑙, 𝑛 + 𝑀

2 > 𝐹𝑔 𝑙, 𝑛 2 𝑏𝑜𝑒 𝐹𝑔 𝑙, 𝑛 + 𝑀 2 ≤ 𝑍 𝑙, 𝑛 2

𝑍 𝑙, 𝑛 , 𝑓𝑚𝑡𝑓

F B C

slide-27
SLIDE 27

Proposed Approach

▪ Harmonic noise presence detector for better speech preservation ▪ Set of filters with changing length, until maximal filter length L, based on the available amount

  • f noise samples

𝐽 𝑙, 𝑛 = ቊ1 𝑊 𝑙, 𝑛 ∈ ℋ0 𝑊 𝑙, 𝑛 ∈ ℋ1

F B C C

slide-28
SLIDE 28

Performance Measures

▪ Distortion Index ▪ Noise reduction Factor ▪ Perceptual Evaluation of Speech Quality (PESQ) ITU-T P.862.2 ▪ The Short-Time Objective Intelligibility (STOI)

𝑤𝑡𝑒 𝜊𝑜𝑠

slide-29
SLIDE 29

Transient Reduction

Frame Index NRR [dB]

3 , 3 = = L  threshold indicator dB 25 − 5 . = 

Better noise reduction which leads to improved , PESQ, and STOI levels for the combined filter 𝜊𝑜𝑠

slide-30
SLIDE 30

Step Size

▪ An appropriate selection of the step size is required ▪ Fixed step size ▪

Frame Index 𝑤𝑡𝑒 NRR [dB]

𝜐

5 . = 

 ?

const , MI max  

[Frames]

slide-31
SLIDE 31

PESQ STOI 𝑤𝑡𝑒 𝑒𝐶 𝜊𝑜𝑠 𝑒𝐶

5 . =  threshold indicator dB 25 −

𝜐 [Frames] 𝜐 [Frames] 𝜐 [Frames] 𝜐 [Frames]

Combined & MI-Combined show better results than MI

slide-32
SLIDE 32

PESQ STOI 𝑤𝑡𝑒 𝑒𝐶

5 . =  threshold indicator dB 25 −

𝜐 [Frames] 𝜐 [Frames] 𝜐 [Frames]

𝜊𝑜𝑠 𝑒𝐶

𝜐 [Frames]

short L

1 = 

Recommendation: Combined & MI-Combined show better results than MI

slide-33
SLIDE 33

Noise Presence Indicator

3 , 1 = = L 

slide-34
SLIDE 34

Experimental Results Summary

3 , 1 = = L  threshold indicator dB 25 −

slide-35
SLIDE 35

Frame Index Frame Index

  • Freq. Index
  • Freq. Index

(a) Clean signal (c) Enhanced signal - MI Frame Index

  • Freq. Index

(b) Noisy signal (d) Enhanced signal - Combined

  • Freq. Index

Frame Index

slide-36
SLIDE 36

Conclusions

▪ Introduced the combined filter ▪ Parameter selection ▪ Noise presence indicator impact ▪ Improved results compared to other methods

slide-37
SLIDE 37

Future Research

▪Noise presence indicator implementation ▪ Residual noise at transient edges ▪ Deep Learning approach for noise reduction

slide-38
SLIDE 38
slide-39
SLIDE 39

Speech Enhancement Using ARCH model

▪ We investigate the use of the autoregressive conditional heteroscedasticity (ARCH) model as a replacement for the well-known Decision-Directed estimator by Epharim and Malah ▪ We employ three sound quality measures: speech distortion, noise reduction and musical noise, and explain the effect the ARCH model parameters have on these measures. ▪ We demonstrate that the ARCH model achieves better results than the decision-directed for some of these measures, while compromising between the speech distortion and noise reduction.

slide-40
SLIDE 40

Problem Formulation

▪ Let 𝑍

ℓ 𝑙 = 𝑌ℓ 𝑙 + 𝐸ℓ 𝑙 denote an observed noisy speech signal in the STFT

domain. ▪ Given an error function between the clean signal and its estimate, the spectral enhancement problem can be formulated as ෠ 𝑌ℓ 𝑙 = argmin ෠

𝑌𝐹 𝑓 𝑌ℓ 𝑙 , ෠

𝑌 𝑙 |𝑍

0 𝑙 , … , 𝑍 ℓ′ 𝑙

▪ We consider the casual case ℓ ≤ ℓ′ and the LSA error function 𝑓LSA 𝑌ℓ 𝑙 , ෠ 𝑌ℓ 𝑙 = log 𝑌ℓ 𝑙 − log ෠ 𝑌ℓ 𝑙

2

slide-41
SLIDE 41

Problem Formulation

▪ The estimate is obtained by applying a spectral gain to each noisy spectral component: ෠ 𝑌ℓ 𝑙 = 𝐻LSA 𝜊ℓ|ℓ′ ∙ 𝑍

where the a-priori and a-posteriori SNRs are defined, respectively, by: 𝜊ℓ|ℓ′ ≜

𝜇ℓ|ℓ′ 𝜏ℓ

2 , 𝛿ℓ ≜

𝑍ℓ 2 𝜏ℓ

2

𝜏ℓ

2 = 𝐹 𝐸ℓ 2 denotes the short-term spectrum of the noise, and

𝜇ℓ|ℓ′ = 𝐹 𝑌ℓ 2|𝑍

0 𝑙 , … , 𝑍 ℓ′ 𝑙

denotes the short-term spectrum of the speech signal.

slide-42
SLIDE 42

Decision-Directed

  • Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean square error short-time spectral

amplitude estimator,“ IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP-32, pp. 1109-1121, December 1984

▪ Over the past decades, the decision-directed (DD) approach has become the acceptable estimation method for the a-priori SNR መ 𝜊ℓ|ℓ = max 𝛽 ෠ 𝑌ℓ−1

2

𝜏ℓ

2

+ 1 − 𝛽 𝑄 𝛿ℓ − 1 , 𝜊min where 𝑄 𝑦 = 𝑦 if 𝑦 ≥ 0 and 𝑄 𝑦 = 0 otherwise. ▪ The decision-directed approach is not supported by a statistical model. ▪ 𝛽 and 𝜊min have to be determined by simulations. ▪ 𝛽 and 𝜊min are fixed constants and are not adapted to the speech components.

slide-43
SLIDE 43

ARCH Model

▪ The GARCH (generalized autoregressive conditional heteroscedasticity) model is extensively used in financial applications where it is necessary to model time varying volatility while taking into account heavy tailed behavior and volatility clustering. ▪ Recently 1 , it was proposed to use the GARCH for statistically modeling the speech signals in the STFT domain, as they show these two characteristics. ▪ In this work, we investigate the use of a simplified case of the GARCH, the ARCH

  • model. We explain the effect that the ARCH model parameters have on

commonly used performance measures and compare it to the decision-directed estimator.

[1] I. Cohen, “Modeling speech signals in the time frequency domain using GARCH,” Signal Processing, vol. 84 (12), pp. 2453–2459, 2004.

slide-44
SLIDE 44

ARCH Model

We use a two-step estimator, to recursively update the estimate of the conditional a- priori SNR as new data arrives. Given an estimate of መ 𝜊ℓ|ℓ−1 and a new noisy spectral component 𝑍

Update step: መ 𝜊ℓ|ℓ = 𝐹 ฬ

𝑌ℓ 2 𝜏ℓ

2

መ 𝜊ℓ|ℓ−1, 𝑍

Using ARCH(1), propagate the a-priori SNR to obtain the one-frame-ahead a priori SNR, Propagation step: መ 𝜊ℓ|ℓ−1 = 𝜆 + 𝜈 መ 𝜊ℓ−1|ℓ−1, 𝜆 > 0, 0 ≤ 𝜈 < 1

slide-45
SLIDE 45

ARCH Model

▪ Solving for the update step we get: መ 𝜊ℓ|ℓ = 𝐻𝑇𝑄

2

መ 𝜊ℓ|ℓ−1, 𝛿ℓ ∙ 𝛿ℓ where 𝐻𝑇𝑄 𝜊ℓ|ℓ′, 𝛿ℓ =

𝜊ℓ|ℓ′ 𝜊ℓ|ℓ′+1 1 𝛿ℓ + 𝜊ℓ|ℓ′ 𝜊ℓ|ℓ′+1

▪ Employing some algebra, we can write: መ 𝜊ℓ|ℓ = 𝛽ℓ መ 𝜊ℓ|ℓ−1 + 1 − 𝛽ℓ 𝛿ℓ − 1 where 𝛽ℓ = 1 −

෠ 𝜊ℓ|ℓ−1 ෠ 𝜊ℓ|ℓ−1+1 2

, 𝛽ℓ ∈ 0,1 ▪ Note the similarity of form to the decision-directed but with a time-varying frequency-dependent weighting factor 𝛽ℓ.

slide-46
SLIDE 46

ARCH Model

▪ Since the a-priori SNRs need to be equal to 𝜊min when speech is absent, we

  • btain a condition on 𝜆, 𝜆 = 1 − 𝜈 𝜊min

▪ Using ARCH(1) we have two parameters 𝜊min and 𝜈: Propagation step: መ 𝜊ℓ|ℓ−1 = 1 − 𝜈 𝜊min + 𝜈 መ 𝜊ℓ−1|ℓ−1, Update step: መ 𝜊ℓ|ℓ = 𝛽ℓ መ 𝜊ℓ|ℓ−1 + 1 − 𝛽ℓ 𝛿ℓ − 1 , where 𝛽ℓ = 1 −

෠ 𝜊ℓ|ℓ−1 ෠ 𝜊ℓ|ℓ−1+1 2

, 𝛽ℓ ∈ 0,1

slide-47
SLIDE 47

Distortion and NRR

We employ three performance measures commonly used for the quality assessment of a speech enhancement algorithm. The first two are easily understood when we express the estimated signal as ෠ 𝑌ℓ = 𝐻 𝜊ℓ|ℓ′, 𝛿ℓ 𝑌ℓ + 𝐻 𝜊ℓ|ℓ′, 𝛿ℓ 𝐸ℓ = 𝑌𝑔𝑒 + 𝐸𝑠𝑜 Speech distortion: 𝐾𝑌 ≜ 𝐹 log 𝑌ℓ 𝑙 − log 𝐻 𝜊ℓ|ℓ′, 𝛿ℓ 𝑌ℓ

2

Noise Reduction Ratio (NRR): NRR ≜

𝐹 𝐸ℓ 2 𝐹 𝐻 𝜊ℓ|ℓ′,𝛿ℓ 𝐸ℓ

2

slide-48
SLIDE 48

Musical noise via higher order statistics

The attenuated noise will be composed of isolated spectral components, also known as tonal components. The amount of tonal components can be quantified by the kurtosis; kurtosis = Τ 𝜈4 𝜈2

2, where 𝜈𝑛 is the 𝑛th order moment of the signal.

As we are interested in the amount of tonal components caused by the processing, we use the ratio of the kurtosis before and after the processing: LKR ≜ log10 kurtosisproc kurtosisorf which is evaluated on noise only frames. The LKR increases as the musical noise increases, and the absence of musical noise corresponds to LKR of zero and below.

slide-49
SLIDE 49

Musical noise via higher order statistics

Analytical calculation of the kurtosis ratio requires the use of a specific noise reduction method or assumptions about the statistical spectral components. Here, we use the sample kurtosis: kurtosis =

1 𝑀 σℓ=0 𝑀

1 𝑂 σ𝑙=0 𝑂−1 𝐸ℓ(𝑙) 2− 𝐸ℓ(𝑙) 2 4 1 𝑂 σ𝑙=0 𝑂−1 𝐸ℓ(𝑙) 2− 𝐸ℓ(𝑙) 2 2 2

Where 𝐸ℓ(𝑙) 2=

1 𝑂 σ𝑙=0 𝑂−1 𝐸ℓ(𝑙) 2

slide-50
SLIDE 50

Experimental Setup

▪ Speech signals: 20 different utterances from 20 different speakers, sampled at 16 kHz and degraded by white Gaussian noise with SNRs in the range [0,20]dB. ▪ The noisy signals are transformed to the time-frequency domain using STFT, with 75% overlapping Hamming analysis windows of 32ms length. ▪ The evaluation of the musical noise was done separately on a complex white Gaussian noise in the time-frequency domain, to emulate performance in noise

  • nly frames.
slide-51
SLIDE 51

Experimental Setup

Comparison of decision-directed (solid lines) and ARCH (dashed lines) estimators for 5dB SNR: (a) Distortion, (b) NRR and, (c) LKR, with varying 𝛽(upper axis) and 𝜈(lower axis) respectively per estimator, and 𝜊min of -20dB (square), -15dB (circle), and for decision-directed method only 𝜊min = 0 (triangle).

slide-52
SLIDE 52

Experimental Setup

We get the expected decision-directed behavior

slide-53
SLIDE 53

Experimental Setup

For the ARCH estimator, increasing the value of 𝜈 decreases the distortion

slide-54
SLIDE 54

Experimental Setup

When 𝜈 increases also the NRR decreases. The lower we take the noise floor 𝜊min , the more noise reduction we get.

slide-55
SLIDE 55

Experimental Setup

The musical noise mainly depends on the noise floor 𝜊min . Lower 𝜊min means higher 𝛽ℓ , resulting in a smoother a-priori SNR around 𝜊min , thus reducing the musical noise.

slide-56
SLIDE 56

Experimental Setup

For the decision-directed estimator we have to compromise between the amount of distortion and amount of musical noise, while for the ARCH estimator, the musical noise can be eliminated by choosing an appropriate value of 𝜊min . However, for the ARCH estimator we need to compromise between the amount of distortion and the amount of residual noise.

slide-57
SLIDE 57

Conclusions

Results summary: ▪ We presented the use of the ARCH estimator, which is based on a statistical model. ▪ We explained the effect the ARCH model parameters have on three commonly used quality measures. ▪ We demonstrated that the ARCH model can achieve better results than the decision- directed, while compromising between the speech distortion and noise reduction. Future work: ▪ We used the ARCH(1) model for the a-priori SNR estimator, which is a special case of the GARCH(0,1). It would be interesting to expand the model to a full GARCH(p,q) model and conduct a similar analysis, to understand if the full general model could provide additional advantages

slide-58
SLIDE 58

Robust Superdirective Beamformer with Optimal Regularization

▪ We introduce an optimal beamformer design that facilitates a compromise between high directivity and low white noise amplification. ▪ The proposed beamformer involves a regularization factor, whose optimal value is determined using a simple and efficient one-dimensional search algorithm. ▪ Simulation results demonstrate controlled tuning of various gain properties of the desired beamformer, and improved performance compared to a competing method.

slide-59
SLIDE 59

Signal Model and Array Setup

▪ We consider a plane wave, in the farfield, impinging on an array at angle 𝜄 ▪ Uniform linear microphone array of 𝑁 sensors, with distance 𝜀 between them ▪ The desired signal 𝑌(𝜕) propagates from 𝜄 = 0 (endfire) ▪ Neglecting the propagation attenuation, the observed signal is 𝐳 𝜕 = 𝐞 𝜕, 𝜄 𝑌 𝜕 + 𝐰(𝜕) where 𝐞 𝜕, 𝜄 is the steering vector, and 𝐰(𝜕) is the additive noise vector. 𝐞 𝜕, 𝜄 = 1 𝑓−𝑘𝜕 cos 𝜄𝜐0 ⋯ 𝑓−𝑘 𝑁−1 𝜕 cos 𝜄𝜐0 𝑈, 𝜐0 = 𝜀 𝑑

slide-60
SLIDE 60

Signal Model and Array Setup

▪ For the endfire direction 𝐞 𝜕 = 𝐞 𝜕, 0 ▪ Applying a complex linear filter 𝐢 𝜕 , the estimated signal is Z 𝜕 = 𝐢𝐼 𝜕 𝐳 𝜕 = 𝐢𝐼 𝜕 𝐞 𝜕 𝑌 𝜕 + 𝐢𝐼 𝜕 𝐰(𝜕) ▪ The beamformer is distortionless when 𝐢𝐼 𝜕 𝐞 𝜕 = 1

slide-61
SLIDE 61

Performance measures

▪ Taking the first microphone as reference, we define the input and output SNR iSNR ω =

𝜚𝑌(𝜕) 𝜚𝑊1(𝜕)

  • SNR ω =

𝜚𝑌(𝜕) 𝜚𝑊1(𝜕) × 𝐢𝐼 𝜕 𝐞 𝜕

2

𝐢𝐼 𝜕 𝚫𝐰 𝜕 𝐢 𝜕

where 𝜚𝑔 𝜕 = 𝐹( 𝑔 𝜕

2) is the variance of 𝑔 ∈ 𝑌, 𝑊 1 , and 𝚫𝐰 𝜕 =

ൗ 𝐹 𝐰(𝜕)𝐰𝐼 𝜕 𝜚𝑊

1(𝜕)

is the pseudo-coherence matrix of the noise. ▪ We deduce the gain in SNR: 𝒣 𝐢 𝜕 = oSNR ω iSNR ω = 𝐢𝐼 𝜕 𝐞 𝜕

2

𝐢𝐼 𝜕 𝚫𝐰 𝜕 𝐢 𝜕 ▪ WNG: 𝚫𝐰 𝜕 = 𝐉𝑁, 𝒳 𝐢 𝜕 =

𝐢𝐼 𝜕 𝐞 𝜕

2

𝐢𝐼 𝜕 𝐢 𝜕

▪ DF: 𝚫𝐰 𝜕 = 𝚫𝒆 𝜕 =

1 2 ׬ 𝜌 𝐞 𝜕, 𝜄 𝐞𝐼 𝜕, 𝜄 sin 𝜄𝑒𝜄, 𝒠 𝐢 𝜕

=

𝐢𝐼 𝜕 𝐞 𝜕

2

𝐢𝐼 𝜕 𝚫𝐞 𝜕 𝐢 𝜕

slide-62
SLIDE 62

Conventional Beamformers

▪ Delay-and-Sum (DS): maximizes the WNG subject to the distortionless constraint 𝐢DS 𝜕, 𝜄 = 𝐞 𝜕, 𝜄 𝑁 𝒳 𝐢DS 𝜕, 𝜄 = 𝑁 = 𝒳

max

𝒠 𝐢DS 𝜕, 𝜄 = 𝑁2 𝐞𝐼 𝜕, 𝜄 𝚫𝐞 𝜕 𝐞 𝜕, 𝜄 ≥ 1 While the DS maximizes WNG it never amplifies diffuse noise. ▪ Superdirective (SD): maximizes the DF subject to the distortionless constraint for the specific case of 𝜄 = 0 and small 𝜀 𝐢SD 𝜕, 𝜄 = 𝚫𝐞

−1 𝜕 𝐞 𝜕

𝐞𝐼 𝜕 𝚫𝐞

−1 𝜕 𝐞 𝜕

while maximizing the DF the 𝐢SD 𝜕, 𝜄 can amplify the white noise especially at low frequencies

slide-63
SLIDE 63

Conventional Beamformers

▪ Robust Superdirective 𝐢𝑆,𝜁 𝜕 = 𝚫𝐞 𝜕 + 𝜻𝐉𝑁 −1𝐞 𝜕 𝐞𝐼 𝜕 𝚫𝐞 𝜕 + 𝜻𝐉𝑁 −1𝐞 𝜕 Where 𝜁 ≥ 0 is a Lagrange multiplier, which enables a compromise between the DF and the WNG If we define 𝚫𝜁 𝜕 = 𝚫𝐞 𝜕 + 𝜻𝐉𝑁 , we can write 𝐢𝑆,𝜁 𝜕 = 𝚫𝜻

−1 𝜕 𝐞 𝜕

𝐞𝐼 𝜕 𝚫𝜻

−1 𝜕 𝐞 𝜕

While the robust superdirective beamformer has control on the white noise amplification, it is not easy to find a closed form expression for 𝜁 for a desired value of the WNG

slide-64
SLIDE 64

Combined beamformer

  • R. Berkun, I. Cohen, and J. Benesty, “Combined beamformers for robust broadband regularized superdirective beamforming,“

IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 23, pp. 877-886, May 2015

Berkun et al. proposed the combined beamformer: 𝐢𝛽,𝜁 𝜕 =

𝚫𝜻

−1 𝜕 +𝛽 𝜕 𝐉𝑁 𝐞 𝜕

𝐞𝐼 𝜕 𝚫𝜻

−1 𝜕 +𝛽 𝜕 𝐉𝑁 𝐞 𝜕 , 𝛽 ∈ ℝ

It can be reformulated as 𝐢𝛽,𝜁 𝜕 = 𝐢𝑆,𝜁 𝜕 1 + 𝛽𝜁 𝜕 + 𝐢𝐸𝑇 𝜕 1 + 𝛽𝜁

−1 𝜕

Where 𝛽𝜁 𝜕 = 𝛽 𝜕

𝒳max 𝒠max,𝜁 𝜕 and 𝒠max,𝜁 𝜕 = 𝐞𝐼 𝜕 𝚫𝜻 −1 𝜕 𝐞 𝜕

For a fixed 𝒳 𝐢𝛽,𝜁 𝜕 = 𝒳0 < 𝑁 or a fixed 𝒠 𝐢𝛽,𝜁 𝜕 = 𝒠0 it is possible to analytically calculate 𝛽𝜁 𝜕 and hence 𝛽 𝜕 . While finding a closed form solution for the parameter 𝛽 𝜕 , which enables control of the trade-

  • ff in performance between the WNG and the DF, The method does not address finding the

regularization parameter 𝜁 and assumes it is user determined.

slide-65
SLIDE 65

New Noise Field

▪ We assume the signal is corrupted both by diffuse noise and additive white noise. ▪ The input and output SNR: iSNR ω = tr 𝜚𝑌(𝜕)𝐞 𝜕 𝐞𝐼 𝜕 tr 𝜚𝑒 𝜕 𝚫𝐞 𝜕 + 𝜚𝑥(𝜕)𝐉𝑁 = 𝜚𝑌(𝜕) 𝜚𝑒 𝜕 + 𝜚𝑥(𝜕)

  • SNR ω =

𝜚𝑌(𝜕) 𝐢𝐼 𝜕 𝐞 𝜕

2

𝜚𝑒 𝜕 𝐢𝐼 𝜕 𝚫𝐞 𝜕 𝐢 𝜕 + 𝜚𝑥(𝜕)𝐢𝐼 𝜕 𝐢 𝜕 ▪ The SNR gain: 𝒣 𝐢 𝜕 = 𝐢𝐼 𝜕 𝐞 𝜕

2

1 − 𝛽(𝜕) 𝐢𝐼 𝜕 𝚫𝐞 𝜕 𝐢 𝜕 + 𝛽(𝜕)𝐢𝐼 𝜕 𝐢 𝜕 Where 𝛽 𝜕 = 𝜚𝑥(𝜕) 𝜚𝑒 𝜕 + 𝜚𝑥(𝜕) , 0 ≤ 𝛽 𝜕 ≤ 1

slide-66
SLIDE 66

The optimal Beamformer

▪ The proposed beamformer which maximizes the SNR gain is: 𝐢𝜷 𝜕 =

𝚫𝐞,𝛽

−1 𝜕 𝐞 𝜕

𝐞𝐼 𝜕 𝚫𝐞,𝛽

−1 𝜕 𝐞 𝜕 , where 𝚫𝐞,α 𝜕 = 1 − 𝛽(𝜕) 𝚫𝐞 𝜕 + 𝛽(𝜕)𝐉𝑁

▪ The SNR gain 𝒣 𝐢𝜷 𝜕 = 𝐞𝐼 𝜕 𝚫𝐞,𝛽

−1 𝜕 𝐞 𝜕

▪ The proposed beamformer is equivalent to 𝐢𝑆,𝜁 𝜕 with 𝜁 𝜕 =

𝛽(𝜕) 1−𝛽(𝜕)

▪ Problem: 𝜚𝑒 𝜕 , 𝜚𝑥(𝜕) are not known → 𝛽(𝜕) is not known. ▪ Advantage 1: 𝛽(𝜕) varies from 0 to 1. ▪ Advantage 2: the gain is continuous and has a single minimum point in this range, the WNG and DF are monotonic in this range. ▪ Solution: 𝛽(𝜕) is found employing a binary-like search on each monotonic section.

slide-67
SLIDE 67

Algorithm 1

▪Input: Desired gain 𝒣0 , and tolerance ▪Output: Optimal regularization 𝛽 1. Find 𝛽min that minimizes the gain (e.g., using gradient descent) 2. Divide the range 0,1 into 2 sections in which the gain is monotonic: 0, 𝛽min and 𝛽min, 1 3. For each section, apply the following continuous binary search: 4. Divide the section into 2 sub-sections 5. Calculate the gain 𝒣𝑙 in the middle of each sub-section 6. Choose the gain 𝒣𝑙 and its respective sub- section for which 𝒣𝑙 − 𝒣0 is minimal 7. if 𝒣𝑙 − 𝒣0 ≤ tolerance then 8. 𝛽 ←(middle of chosen sub-section) and stop 9. else

  • 10. update range to be the chosen sub-section

and go back to 4

  • 11. endif
  • 12. Compare results 0, 𝛽min and 𝛽min, 1 ,

and choose the best result

slide-68
SLIDE 68

Experimental Results

Setup: 𝑁 = 8 microphones, 𝜀 = 1 cm spacing Array gains for fixed SNR 𝛽(𝜕) is found for desired fixed SNR gain 𝒣0 using the proposed algorithm

slide-69
SLIDE 69

Experimental Results

Array gains for fixed WNG 𝛽(𝜕) is found for maximal SNR gain under a constant desired WNG 𝒳0 using the proposed algorithm from step 4

→ Our proposed beamformer outperforms the combined beamformer with 𝜁 = 10−4

slide-70
SLIDE 70

Experimental Results

Array gains for fixed DF in multi-band 𝛽(𝜕) is found for maximal SNR gain under a piece-wise constant gradually increasing DF using the proposed algorithm from step 4

→WNG-DF trade-off can be considered at each frequency band separately! → Our proposed beamformer outperforms the combined beamformer with 𝜁 = 10−4

slide-71
SLIDE 71

Conclusions

Results summary:

▪ The proposed approach facilitates the design of beamformers with fixed SNR gain, beamformers with maximal SNR gain for constant WNG or DF, and multi-band fixed beamformers. ▪ Enables a fine tuning of the compromise between the DF and robustness against white noise.

Future work:

▪ Testing various angles of incidence other than the end-fire direction. ▪ Incorporating other considerations such as side-lobe requirements and performance under

  • ther types of noise fields.