? ? ? ? Anil Alexander OPENING THE BLACK BOX FOR FORENSIC and AUTOMATIC SPEAKER RECOGNITION Finnian Kelly Oxford Wave Research Ltd
WHO WE ARE Oxford Wave Research Ltd (OWR) is an audio and speech R&D company based in Oxford, UK. We develop systems for: Automatic Speaker Recognition Speaker Diarization Audio Fingerprinting Our products are used by law enforcement the UK, US, Europe and the Middle East including the MET police, UK MoD, Netherlands Forensic Institute, German BKA etc.
“AUTOMATIC SPEAKER RECOGNITION IS A …”
“AUTOMATIC SPEAKER RECOGNITION IS A …” Lay people Juries, judges and lawyers Even forensic experts!
FORENSICS AND ‘THE BLACK BOX’ Recent advances in Speaker Recognition involve a huge number of variables – training and evaluation data, feature modelling and parameter choices. A lot of the focus has been on incremental improvements on large datasets of the variability is designed or controlled. How does this sit in in the context of opening the black box in real forensic casework?
THE ENFSI GUIDELINES (2015) Logic Balance Robustness Transparency
A TYPICAL AUTOMATIC PIPELINE
SOURCES OF DATA VARIABILITY TV Matrix UBM Training How does this affect the Likelihood Ratios or the LDA/PLDA Strength of evidence? Multiple data selection decisions before you even get started
BENEFITTING FROM THE EXPERTISE OF FORENSIC PHONETICIANS Most of the forensic speaker recognition case-work is performed by forensic phoneticians who • Have a lot of experience and knowledge in voice comparison and an understanding of the legal requirements in their area • Want to include automatic methods, but do not have any straight-forward means of incorporating their knowledge into an automatic analysis. • Would like to make their speaker recognition analysis more objective using likelihood ratios and evaluating system performance for each case.
SEMI-AUTOMATIC AND AUTOMATIC SPEAKER RECOGNITION * *LTF illustration from Catalina Manual
TOWARDS A COMMON METHODOLOGICAL PLATFORM Bayesian Likelihood Ratios
OPENING UP THE BLACK BOX The ‘black box’ creates a situation in which the forensic expert is unable to look, or indeed adapt the automatic system to their own requirements. The expert should able to change the system parameters and introduce new data at every step of the speaker recognition process. The expert should not limited to manufacturer-provided models or configurations, and has the ability to train the system specifically for their problem domain.
OUR APPROACH VOCALISE Voice Comparison and Analysis of the Likelihood of Speech Evidence Flexible Features ‘Automatic’ spectral features ‘Traditional’ forensic phonetic parameters ‘User’- provided features Flexible Modeling State of the art ivector/PLDA ‘Classical’ – GMM/GMM-UBM The ‘Session’ Concept: • Pre-trained and optimised models provided • Ability to introduce data at all stages of the ivector pipeline • Ability to adapt the system to the conditions of the case
Recommend
More recommend