PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection Theory October 2019 Heikki Huttunen heikki.huttunen@tuni.fi Signal Processing Tampere University
default Detection theory • In this section, we will brie fl y consider detection theory. • Detection theory has many common topics with machine learning. • The methods are based on estimation theory and attempt to answer questions such as • Is a signal of speci fi c model present in our time series? E.g., detection of noisy sinusoid; beep or no beep? • Is the transmitted pulse present at radar signal at time t ? • Does the mean level of a signal change at time t ? • After calculating the mean change in pixel values of subsequent frames in video, is there something moving in the scene? • Is there a person in this video frame? • The area is closely related to hypothesis testing , which is widely used e.g., in medicine: Is the response in patients due to the new drug or due to random fl uctuations? 2 / 35
default Detection theory • Consider the detection of a sinusoidal waveform Noiseless Signal 1 0 1 0 200 400 600 800 Noisy Signal 0.0 2.5 0 200 400 600 800 Detection Result 40 20 0 0 200 400 600 800 3 / 35
default Detection theory • In our case, the hypotheses could be H 1 : x [ n ] = A cos( 2 π f 0 n + φ ) + w [ n ] H 0 : x [ n ] = w [ n ] • This example corresponds to detection of noisy sinusoid. • The hypothesis H 1 corresponds to the case that the sinusoid is present and is called alternative hypothesis . • The hypothesis H 0 corresponds to the case that the measurements consists of noise only and is called null hypothesis . 4 / 35
default Introductory Example • Consider a simplistic detection problem, where we observe one sample x [ 0 ] from one of two densities: N ( 0 , 1 ) or N ( 1 , 1 ) . • The task is to choose the correct density in an optimal manner. 0 . 5 Gaussian with µ = 1 0 . 4 Gaussian with µ = 0 0 . 3 0 . 2 Where did this come from? 0 . 1 0 . 0 − 2 0 2 4 6 8 5 / 35
default Introductory Example • Our hypotheses are now H 1 : µ = 1 , H 0 : µ = 0 , and the corresponding likelihoods are plotted below. Likelihood of observing different values of x[0] given 0 or 1 0.4 p(x[0] | 0 ) p(x[0] | 1 ) 0.3 Likelihood 0.2 0.1 0.0 4 3 2 1 0 1 2 3 4 x[0] 6 / 35
default Introductory Example • An obvious approach for deciding the density would choose the one, which is higher for a particular x [ 0 ] . • More specifically, study the likelihoods and choose the more likely one. • The likelihoods are � − ( x [ 0 ] − 1 ) 2 � 1 H 1 : p ( x [ 0 ] | µ = 1 ) = √ exp . 2 π 2 � − ( x [ 0 ]) 2 � 1 √ H 0 : p ( x [ 0 ] | µ = 0 ) = exp . 2 π 2 • One should select H 1 if " µ = 1" is more likely than " µ = 0". • In other words, p ( x [ 0 ] | µ = 1 ) > p ( x [ 0 ] | µ = 0 ) . 7 / 35
default Introductory Example • Let’s state this in terms of x [ 0 ] : p ( x [ 0 ] | µ = 1 ) > p ( x [ 0 ] | µ = 0 ) ⇔ p ( x [ 0 ] | µ = 1 ) p ( x [ 0 ] | µ = 0 ) > 1 � � − ( x [ 0 ] − 1 ) 2 2 π exp 1 √ 2 ⇔ > 1 � � − ( x [ 0 ]) 2 2 π exp 1 √ 2 − ( x [ 0 ] − 1 ) 2 − x [ 0 ] 2 � � ⇔ exp > 1 2 8 / 35
default Introductory Example ⇔ ( x [ 0 ] 2 − ( x [ 0 ] − 1 ) 2 ) > 0 ⇔ 2 x [ 0 ] − 1 > 0 ⇔ x [ 0 ] > 1 2 . • In other words, choose H 1 if x [ 0 ] > 0 . 5 and H 0 if x [ 0 ] < 0 . 5. • Studying the ratio of likelihoods (second row of the previous derivation) is the key: p ( x [ 0 ] | µ = 1 ) p ( x [ 0 ] | µ = 0 ) > 1 • This ratio is called likelihood ratio , and comparison to a threshold (here γ = 1) is called likelihood ratio test (LRT). • Of course the detection threshold γ may be chosen other than γ = 1. 9 / 35
default Error Types • It might be that the detection problem is not symmetric and some errors are more costly than others. • For example, when detecting a disease, a missed detection is more costly than a false alarm. • The tradeoff between misses and false alarms can be adjusted using the threshold of the LRT. 10 / 35
default Error Types • The below figure illustrates the probabilities of the two kinds of errors. • The blue area on the left corresponds to the probability of choosing H 1 while H 0 would hold (false match). • The red area is the probability of choosing H 0 while H 1 would hold (missed detection). 0.40 0.35 0.30 Decide H 0 when H 1 holds Decide H 1 when H 0 holds Likelihood 0.25 0.20 0.15 0.10 0.05 0.00 4 3 2 1 0 1 2 3 4 x[0] 11 / 35
default Error Types • It can be seen that we can decrease either probability arbitrarily small by adjusting the detection threshold. Detection threshold at 0. Small amount of missed detections (red) but many false matches (blue). 0.40 0.35 0.30 Decide H 0 when H 1 holds Decide H 1 when H 0 holds Likelihood 0.25 0.20 0.15 0.10 0.05 0.00 4 3 2 1 0 1 2 3 4 x[0] Detection threshold at 1.5. Small amount of false matches (blue) but many missed detections (red). 0.40 0.35 0.30 Decide H 0 when H 1 holds Decide H 1 when H 0 holds Likelihood 0.25 0.20 0.15 0.10 0.05 0.00 4 3 2 1 0 1 2 3 4 x[0] 12 / 35
default Error Types • For example, suppose the threshold is γ = 1 . 5. What are P FA and P D ? • Probability of false alarm is found by integrating over the blue area: � ∞ � − ( x [ 0 ]) 2 � 1 P FA = P ( x [ 0 ] > γ | µ = 0 ) = √ exp dx [ 0 ] ≈ 0 . 0668 . 2 π 2 1 . 5 • Probability of missed detection is the area marked in red: � 1 . 5 � − ( x [ 0 ] − 1 ) 2 � 1 P M = P ( x [ 0 ] < γ | µ = 1 ) = √ exp dx [ 0 ] ≈ 0 . 6915 . 2 π 2 −∞ • An equivalent, but more useful term is the complement of P M : probability of detection: � ∞ � − ( x [ 0 ] − 1 ) 2 � 1 P D = 1 − P M = √ exp dx [ 0 ] ≈ 0 . 3085 . 2 π 2 1 . 5 13 / 35
default Choosing the threshold • Often we don’t want to define the threshold, but rather the amount of false alarms we can accept. • For example, suppose we want to find the best detector for our introductory example, and we can tolerate 10% false alarms ( P FA = 0 . 1). • The likelihood ratio detection rule is: Select H 1 if p ( x | µ = 1 ) p ( x | µ = 0 ) > γ The only thing to find out now is the threshold γ such that � ∞ p ( x | µ = 0 ) dx = 0 . 1 . γ 14 / 35
default Choosing the threshold • This can be done with Python function isf , which solves the inverse cumulative distribution function. >>> import scipy.stats as stats >>> # Compute threshold such that P_FA = 0.1 >>> T = stats.norm.isf(0.1, loc = 0, scale = 1) >>> print T 1.28155156554 • The parameters loc and scale are the mean and standard deviation of the Gaussian density, respectively. 15 / 35
default Detector for a known waveform • An important special case is that of a known waveform s [ n ] embedded in WGN sequence w [ n ] : H 1 : x [ n ] = s [ n ] + w [ n ] H 0 : x [ n ] = w [ n ] . • An example of a case where the waveform is known could be detection of radar signals, where a pulse s [ n ] transmitted by us is re fl ected back after some propagation time. Received signal s [ n ] + w [ n ] Transmitted signal s [ n ] 1.0 1 0.5 0 0.0 0.5 1 1.0 0 200 400 600 800 0 200 400 600 800 16 / 35
default Detector for a known waveform • For this case the likelihoods are N − 1 � − ( x [ n ] − s [ n ]) 2 � � 1 p ( x | H 1 ) = √ 2 πσ 2 exp , 2 σ 2 n = 0 N − 1 � − ( x [ n ]) 2 � � 1 √ p ( x | H 0 ) = 2 πσ 2 exp . 2 σ 2 n = 0 • The likelihood ratio test is easily obtained as � � N − 1 �� N − 1 p ( x | H 1 ) ( x [ n ] − s [ n ]) 2 − − 1 � � p ( x | H 0 ) = exp ( x [ n ]) 2 > γ. 2 σ 2 n = 0 n = 0 17 / 35
default Detector for a known waveform • This simplifies by taking the logarithm from both sides: � N − 1 � N − 1 ( x [ n ] − s [ n ]) 2 − − 1 � � ( x [ n ]) 2 > ln γ. 2 σ 2 n = 0 n = 0 • This further simplifies into N − 1 N − 1 1 1 ( s [ n ]) 2 > ln γ. � � x [ n ] s [ n ] − σ 2 2 σ 2 n = 0 n = 0 18 / 35
default Detector for a known waveform • Since s [ n ] is a known waveform (= constant), we can simplify the procedure by moving it to the right hand side and combining it with the threshold: N − 1 N − 1 x [ n ] s [ n ] > σ 2 ln γ + 1 � � ( s [ n ]) 2 . 2 n = 0 n = 0 We can equivalently call the right hand side as our threshold (say γ ′ ) to get the final decision rule N − 1 � x [ n ] s [ n ] > γ ′ . n = 0 19 / 35
default Example • The detector for a sinusoid in WGN is N − 1 N − 1 � � x [ n ] A cos( 2 π f 0 n + φ ) > γ ⇒ A x [ n ] cos( 2 π f 0 n + φ ) > γ. n = 0 n = 0 • Again we can divide by A to get N − 1 � x [ n ] cos( 2 π f 0 n + φ ) > γ ′ . n = 0 • In other words, we check the correlation with the sinusoid. Note that the amplitude A does not affect our statistic, only the threshold which is anyway selected according to the fixed P FA rate. 20 / 35
default Example Noiseless Signal 1 0 1 • As an example, the picture shows the 0 200 400 600 800 Noisy Signal detection process with σ = 0 . 5. 2 • Note, that we apply the detector with a 0 2 sliding window; i.e., we perform the 0 200 400 600 800 Detection Result hypothesis test at every window of 50 length 100. 0 50 0 200 400 600 800 21 / 35
Recommend
More recommend