Applications of in Forensic Science Pushing Out the Frontiers Nick D. K. Petraco and Many Others John Jay College of Criminal Justice !
Outline • Admissibility of Scientific Evidence is a problem! • Frye and the Daubert Standards • How chemistry, engineering, math and computers can help forensic science • Current Projects At John Jay: • Petroleum Distillates (Fire Debris) • Dust (Trace Evidence) • Cartridge cases (Firearms)
Admissibility of scientific evidence! • Principal legal standards: Frye and Daubert • Frye (1923) – Testimony offered as “scientific” must “...have gained general acceptance in the particular field in which it belongs”. • New York is still a “Frye State”
Frye and Daubert • Daubert (1993)- Judges are the “ gatekeepers ” of scientific evidence. • Must determine if the science is reliable • Has empirical testing been done? • Falsifiability • Has the science been subject to peer review ? • Are there known error rates ? • Is there general acceptance ? • Federal Government and 26(-ish) States are “Daubert States”
Raising Standards with Data and Statistics • DNA profiling the most successful application of statistics in forensic science. • Responsible for current interest in “raising standards” of other branches in forensics. • No protocols for the application of statistics to physical evidence. • Our goal: application of objective, numerical computational pattern comparison to physical evidence
What Statistics Can Be Used? • Statistical pattern comparison! • Modern algorithms are called machine learning • Idea is to measure features of the physical evidence that characterize it • Train algorithm to recognize “major” differences between groups of features while taking into account natural variation and measurement error.
Why ? • R is not a proprietary black box! • Open-source and totally transparent! • R maintained by a professional group of statisticians, and computational scientists • From very simple to state-of-the-art procedures available • Very good graphics for exhibits and papers • R is extensible (it is a full scripting language) • R has “commercial versions” too, Revolution R , S+
Fire Debris Analysis Casework • Liquid gasoline samples recovered during investigation: • Unknown history • Subjected to various real world conditions. • If an individual sample can be discriminated from the larger group, this can be of forensic interest. • Gas-Chromatography Commonly Used to ID gas. • Peak comparisons of chromatograms difficult and time consuming. • Does “eye-balling” satisfy Daubert, or even Frye .....????
Study Design • This study was undertaken to examine the variability of gasoline components in • Twenty liquid gasoline samples • Samples from fire investigations in the New York City area • All samples analyzed using Gas Chromatography-Mass Spectrometry • Keto and Wineman target compounds • Fifteen peaks were chosen in this study that represented the common components present in gasoline. • Normalized GC-MS peak areas were utilized to test the discrimination potential of multiple multivariate methods for discrimination.
Chosen Peaks M. Gil !
• Use prcomp, lda (MASS) and rgl : 10D PCA-3D CVA • HOO-CV correct classification rate: 100%.
Dust N. Petraco !
Dust A 19 th Century German magistrate influenced by the writings of Sir Arthur Conan Doyle suggested that Dust and other traces be allowed in legal proceedings. Hans Gross N. Petraco !
What can it tell you? It enables one to identify the people places and things involved in an event. It helps one to associate the people, places and things involved in an event. It can often tell a story. It can help one reconstruct the event.
Develop a simple method that enables you to identify the trace materials commonly found in dust samples Develop a simple generic data sheet (Tool) that allows you to quickly collect data on the trace materials commonly found in dust samples Convert the data sheet in to an Excel Spreadsheet and load into Write some analysis scripts Analyze the data N. Petraco !
• Conformal Prediction Theory Vovk et al . • New, but has roots in 1960’s with Kolmogorov’s ideas on randomness and algorithmic complexity . • Can be used with any statistical pattern classification algorithm . • Independent of data’s underlying probability distribution . • This is a very important property for forensic pattern recognition!! • Well, …sample should be I.I.D. • For identification of patterns, method produces • “ Confidence region ” at Level of confidence , 1- α • Confidence: Measure of how likely I.D. procedure is to be correct P (identification error) ≤ α • Results are valid :
3D PCA-Clustering can show potential for discrimination • Use e1071, caret, pls and custom scripts : • PCA-SVM 27D, refined bootstrap error rate estimate= 0.7%, • 95% CI [0.0%,3.3 %] • CPT 99% level of confidence “I.D.” • Empirical Error rate = 0% • Unique and correct ID intervals = 93.1% • PLS-DA 35D refined Bootstrap error rate estimate= 0.8% • 95% CI [0.0%,3.3%]
Tool Marks G. Petillo Known Match Comparisons 5/8” Consecutively manufactured chisels
Approach For Striated Tool Marks • Obtain striation pattern profiles form 3D confocal microscopy
Primer!shear! P. Diaczuk ! Glock 19 firing pin impression
• 3D confocal image of entire shear pattern
Shear marks on primer of two different Glock 19s
Mean total profile: Mean “waviness” profile: Mean “roughness” profile:
Primer Shear • Primer shears (82-91 profiles) – PCA-SVM, CPT at the 95% level of confidence • Empirical error rate was 4.7% • 90.7% of I.D. intervals were unique and correct • 7% of I.D. intervals had more than 1 I.D. • No “uninformative” intervals were returned – PCA-SVM, HOO-CV • Error rate estimate is 0.0%-4.4%, depending on the number of replicates – PLS-DA, Bootstrap (>10 replicates only) • 95% confidence interval for error rate: [0%, 0%] • 95% confidence interval for average false positive rate: [0%, 0%] • 95% confidence interval for average false negative rate: [0%, 0%] – PLS-DA, HOO-CV • Error rate estimate is 0.0%-4.3%, depending on the number of replicates • Results so far are on par with expectations
• 3D PCA 36 Glocks, 1080 simulated and real primer shear profiles : • 18D PCA-SVM, refined bootstrap gun I.D. error rate 0.3%, 95% CI [0%, 0.8%]
Empirical Bayes’ • Bayes’ Rule: can we realistically estimate posterior error probabilities empirically/falsifiably?? ( ) Pr t + | S - Probability of no actual association ( ) = ( ) Pr S - | t + Pr S - ( ) given a test/algorithm indicates a Pr t + positive ID • Perhaps. Genomics has spawned similar questions: • What is the probability of no disease (S - ) given the differences in expression “scores” of thousands of genes.
Empirical Bayes’ • Erfon’s machinery for “empirical Bayes’ two-groups model” Efron 2007 • Surprisingly simple! • S - , truly no association, Null hypothesis • S + , truly an association, Non-null hypothesis • z , a Gaussian random variate derived from a machine learning task to ID an unknown pattern with a group • Scheme yields estimate of Pr(S - | z ) along with it’s standard error • Called the local false discovery rate (fdr) or posterior error probability • Given a similarity score, fdr is an estimate of the probability that the computer is wrong in calling a “match” • Catch: you need A LOT of z and they should be fairly independent and Pr(S - ) > 0.9
Recommend
More recommend