p(E|H p(E|H p ) p ) p(E|H p(E|H d ) d ) Need for testing In - PowerPoint PPT Presentation

Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case forensic_eval_01 Geoffrey Stewart Morrison Ewald Enzinger p(E|H p(E|H p ) p ) p(E|H p(E|H d ) d )

Need for testing � In forensic voice comparison, calls for validity and reliability to be empirically tested under casework conditions date back to the 1960s, but still go widely unheeded. � Across all branches of forensic science, there is now increasing pressure to validate performance before analysis systems are used to assess strength of evidence for presentation in court – Daubert v Merrell Dow Pharmaceuticals [1993, 509 US 579] – National Research Council Report 2009 – Forensic Science Regulator Codes of Practice 2014 – ENFSI 2015 Methodological guidelines for best practice in forensic semiautomatic and automatic speaker recognition

forensic_eval_01 � Open to operational forensic laboratories and research laboratories � Training and test data based on a real forensic case – relevant population – speaking styles – recording conditions � Virtual Special Issue in Speech Communication – introductory paper includes rules – describe system and procedures in sufficient detail for replication – performance metrics and graphics – discussion and conclusion may include recommendations for practice – submissions accepted over a 2 year timeframe

forensic_eval_01 � Casework conditions vary substantially from case to case � forensic_eval_01 evaluates systems under conditions reflecting those of one real case � Results should not be assumed to be generalisable to other case conditions � For each case, the validity and reliability of the system employed should be assessed under conditions reflecting those of that case

Forensic Voice Comparison Case � Offender recording Telephone call made to a financial institution’s call centre – landline – call centre background noise babble, typing – saved in a compressed format – 46 seconds net speech – adult male Australian English speaker Suspect recording � Police interview – reverberation – ventilation system noise – saved in a compressed format

Data � Male Australian English speakers � Multiple non-contemporaneous recordings per speaker � Multiple speaking tasks per recording session � High-quality audio � Offender condition � Suspect condition – information exchange task as input – interview task as input a-Law MPEG-1 layer 2 x r [i] 8kHz compression/ x r [i] compression/ decompression 300 Hz 3400 Hz decompression G.723.1 s scaling y r [i] compression/ r s y r [i] scaling decompression r offender x n [i] recording suspect noise x n [i] recording noise

Data � Training data: – 423 recordings from 105 speakers – 191 recordings in offender condition – 232 in suspect condition � Test data: – 223 recordings from 61 speakers – 61 recordings in offender condition – 162 in suspect condition

forensic_eval_01 � preliminary results from systems already tested on the forensic_eval_01 data

Enzinger & Morrison i-vector system � 1st through 14th MFCCs + deltas – feature warping � UBM – 512 Gaussians � T-matrix – 400 or 200 dimensions � i-vector domain mismatch compensation – canonical linear discriminant functions (aka LDA), 50 dimensions � PLDA – full rank covariance for B and for W � score to likelihood ratio conversion (aka calibration) – logistic regression

Enzinger & Morrison i-vector system � Generic data for training models which calculate scores � Generic data for training mismatch compensation models in i-vector domain � Case specific data for training score-to-LR model � Case specific data for training models which calculate scores � Case specific + generic data for training mismatch compensation models in i-vector domain � Case specific data for training score-to-LR model

Enzinger & Morrison i-vector system 1 1 Generic data 0.9 0.9 Case specific data 0.8 0.8 0.7 0.7 0.6 0.6 C llr −pooled C llr −mean 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.5 1 1.5 95% credible interval (± order of magnitude)

Enzinger & Morrison i-vector system 1 � Generic data 0.8 Cumulative Proportion 0.6 0.4 0.2 0 � Case specific data 1 0.8 Cumulative Proportion 0.6 0.4 0.2 0 −4 −3 −2 −1 0 1 2 3 4 log10 Likelihood ratio

Batvox v4.1 � evaluated by David van der Vloed, Netherlands Forensic Institute � reference population data – all 105 speakers (1 suspect-condition recording per speaker) – 30 selected by Batvox � imposter data – none – all 105 speakers (1 offender-condition recording per speaker)

Batvox v4.1 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 C llr −pooled C llr −mean 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.5 1 1.5 95% credible interval (± order of magnitude) all reference data + no imposter data all reference data + imposter data selected reference data + no imposter data selected reference data + imposter data

Batvox v4.1 1 1 0.9 0.9 0.8 0.8 0.7 30 reference speakers 0.7 0.6 0.6 C llr −pooled C llr −mean 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 105 reference speakers 0.1 0.1 0 0 0 0.5 1 1.5 95% credible interval (± order of magnitude) all reference data + no imposter data all reference data + imposter data selected reference data + no imposter data selected reference data + imposter data

Batvox v4.1 1 1 0.9 0.9 0.8 0.8 0.7 0.7 no imposters 0.6 0.6 C llr −pooled C llr −mean 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 105 imposters 0.1 0.1 0 0 0 0.5 1 1.5 95% credible interval (± order of magnitude) all reference data + no imposter data all reference data + imposter data selected reference data + no imposter data selected reference data + imposter data

Batvox v4.1 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 C llr −pooled C llr −mean 0.5 0.5 0.4 0.4 0.3 0.3 105 reference speakers 0.2 0.2 105 imposters 0.1 0.1 0 0 0 0.5 1 1.5 95% credible interval (± order of magnitude) all reference data + no imposter data all reference data + imposter data selected reference data + no imposter data selected reference data + imposter data

Batvox v4.1 no imposters 105 imposters all reference data + no imposter data all reference data + imposter data 1 0.8 Cumulative Proportion 0.6 105 reference speakers 0.4 0.2 0 selected reference data + no imposter data selected reference data + imposter data 1 0.8 Cumulative Proportion 30 0.6 reference speakers 0.4 0.2 0 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 log10 Likelihood ratio log10 Likelihood ratio

Esk rrik Asko e http://geoff-morrison.net/ http://forensic-evaluation.net/

Best of 1 1 Batvox v4.1 0.9 0.9 Enzinger & Morrison 0.8 0.8 0.7 0.7 0.6 0.6 C llr −pooled C llr −mean 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.5 1 1.5 95% credible interval (± order of magnitude)

Best of 1 0.8 Cumulative Proportion Batvox v4.1 0.6 0.4 0.2 0 1 0.8 Cumulative Proportion Enzinger & Morrison 0.6 0.4 0.2 0 −4 −3 −2 −1 0 1 2 3 4 log10 Likelihood ratio

p(E|H p(E|H p ) p ) p(E|H p(E|H d ) d ) Need for testing In - PowerPoint PPT Presentation

Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case forensic_eval_01 Geoffrey Stewart Morrison Ewald Enzinger p(E|H p(E|H p ) p ) p(E|H p(E|H d ) d ) Need for testing

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

Property-Based Testing Matt Bachmann @mattbachmann Testing is Important Testing is Important

Software Testing Overview What is software testing? General testing criteria Testing

Software testing Software Testing Introduction Testing levels Automated testing Principles and

1. Test page This page is for testing. This page is for testing. This page is for testing.

Overview Objective Types of testing ECE 553: TESTING AND Verification testing

Object Oriented Testing Chapter 23 1 OO Testing Class Testing: Equivalent to unit testing

Software Testing Software testing 1 V model Software testing 2 Program testing goals To

Development Services in Automotive TESTING LABORATORY Accredited Testing Laboratory Nr. 1552

A review of software testing P DAVID COWARD 200511347 Software testing Software

Chapter 1 Fundamentals of testing 1. Why is testing necessary? 2. What is testing? 3. Test

Functional Testing Review Chapter 8 Functional Testing We saw three types of functional

Chapter 11, Testing ! Function testing Types of errors ! Structure Testing Dealing with

ISO/IEC 24727-5 An IAS Interoperability Standard ISO/IEC 24727-5 Testing Procedures Why do we

Testing in the PHP w orld Marcus Brger PHP Qubec Conference 2007 The need for Testing

Exact JPEG recompression and forensics using interval arithmetic Andrew B. Lewis and Markus G.

Application compartmentalization Conventional gunzip Compartmentalized gunzip UNIX process UNIX

Re-think Data Management Software Design Upon the Arrival of Storage Hardware with Built-in

Memory-Optimized Distributed Graph Processing through Novel Compression Techniques Katia

Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING DFG

Ascent timescales at the Onset of the Oruanui, NZ Supereruption Madison Myers University of

When Malware is Packin Heat; Limits of Machine Learning Classifiers Based on Static Analysis

Modeling Analytics for Computational Storage Veronica Lagrange, Harry Li, Anahita Shayesteh

Sambuz

Useful Links

Newsletter

Mail Us