DLR.de • Chart 1 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 DLR.de • Chart 2 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 Recently… • I had to review a paper, where a CNN was used Visualizing Crash Data Patterns to predict crashes online (their probability) • Not all reviewers were happy with this paper, therefore an interesting discussion between reviewers and editor started Peter Wagner, with Ragna Hoffmann, Marek Junghans, Andreas Leich, and Hagen Saul • One reviewer deems this impossible to work, since CNN’s search for patterns: German Aerospace Center (DLR) – Institute of Transport Systems 32nd ICTCT Conference 2019 However, crashes are essentially rare events and many are pure random Warsaw, Poland (e.g. due to drunk drivers, drunk pedestrians) with no pattern at all. 25 October 2019 • I hope that anybody agrees with me, that this reviewer is wrong CNN = Convolutional Neural Network, Picture taken from here: https://www.superdatascience.com/blogs/the-ultimate-guide-to-convolutional-neural-networks-cnn DLR.de • Chart 3 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 DLR.de • Chart 4 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 Mon Ironically… Tue 20 Wed Thu Sun Fri Share of BAC-related crashes (%) • Example has a particular 10 Sat The toolkit strong pattern 5 • Berlin’s data-base 2001– 2016: a factor of 100 2 between best and worst hour 1 • I will try to show, that this is in fact among the 0.5 strongest patterns in these data 0.2 0 4 8 12 16 20 24 Time of day (h) DLR.de • Chart 5 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 DLR.de • Chart 6 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 Two main instruments • Best introduced by way of an example: crash data-base contains a lot of information, picking only on two of them (plus the id): • Id Time-of-day (h) BAC (yes/no) 17 � “2” / afternoon 1 0 2 22 � “3” / evening 1 … … … • Constructing the contingency table (cross table) from these data Night (0) Morning (1) Afternoon (2) Evening (3) No 49343 497179 705124 287286 9843 3573 5193 11316 Yes
DLR.de • Chart 7 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 DLR.de • Chart 8 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 Dependencies Yielding… Night Morning Afternoon Evening Night Morning Afternoon Evening Sum No -36.2 8.5 10.0 -10.4 No 49343 497179 705124 287286 1538932 259.3 -61.2 -71.8 74.5 Yes 9843 3573 5193 11316 29925 Yes Sum 59186 500752 710317 298602 DLR.de • Chart 9 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 DLR.de • Chart 10 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 A few side remarks ToD Night Morning Afternoon Evening Pearson residuals: 260 0 BAC -4 -72 1 p-value < 2.22e-16 DLR.de • Chart 11 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 DLR.de • Chart 12 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 A glimpse into the data-set Data and Results • Data-set in its original form has ~60 variables • Added another ~60 or so, such as weather, demand (DTV – model-based),… • Apart from a fairly precise geo-location, data contain the collision diagram • From these sets, the following variables have been picked: • year, hour, weekDay, • crash-type (cType), vehicle type (vType), collision diagram (colDia) • nAll, nFatal, nHeavy, nLight, • BAC, age, sex, • adt2009, temp, humidity • Tried not to aggregate, but e.g. for age, adt, temp, humidity we had to
DLR.de • Chart 13 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 DLR.de • Chart 14 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 Collision diagrams Then, brute-force • Data contain collision diagrams for each crash • Pick the 12 most likely collision diagrams in the following • Lines of data: • Original: 3.17M • Crashes: 1.57M • 12 most: 1.07M DLR.de • Chart 15 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 DLR.de • Chart 16 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 Looking closer… (4 th rank) – V = 0.23 The Top 10 (BAC/ hour is in it!) hour 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20212223 Var1 Var2 rank avCV avRank sdRank Comment Pearson residuals: cType colDia 1 0,7492 1 0 Trivial 95 cType vType 2 0,3996 2 0 Trivial? sex vType 3 0,2424 3,2 0,42 Interesting hour BAC 4 0,2252 4,1 0,57 As promised 0 age vType 5 0,1969 5,3 0,48 BAC colDia vType 6 0,1911 6,3 0,48 temp humidity 7 0,1696 7,3 0,48 Not surprising 2 cType age 8 0,1572 8,4 0,70 -4 nHeavy vType 11 0,1465 9,7 1,25 -25 cType adt2009 9 0,1441 10,6 0,70 1 p-value = < 2.22e-16 DLR.de • Chart 17 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 DLR.de • Chart 18 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 Looking closer… (1 st rank) V = 0.749 A weak one, rank 111, V = 0.01 … many more humidity colDia (96,100] (13,35] (35,41] (41,45] (45,50] (50,54] (54,58] (58,62] (62,65] (65,69] (69,72] (72,75] (75,78] (78,81] (81,84] (84,87] (87,89] (89,91] (91,94] (94,96] 11 17 48 49 50 56 58 61 70 75 84 111 1 Pearson 2 3 residuals: 1 Pearson 4 890 residuals: 5.7 4.0 2 2.0 cType 5 nAll 0.0 -4 -2.0 3 4 -270 5 -4.0 6 7 8 6 p-value = 9 -4.9 10 < 2.22e-16 11 7 12 13 p-value = 15 26 < 2.22e-16 43
DLR.de • Chart 19 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 DLR.de • Chart 20 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 Rank 37 – V = 0.05 … many more Conclusions year 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 • We have investigated a rarely used tool to analyze a crash data-base (0,17] Pearson residuals: • It clearly needs a huge amount of crashes to work (17,24] 33 • For these, then, it produces a very general kind of “correlation” between each two (24,29] variables that have been recorded and may or may not have a causal connection • They can be sorted according to Cramér’s V (or any other similar measure) to find (29,35] 4 the ones with a large correlation age 0 (35,40] -4 • These interesting ones of these are to be analyzed by a mosaic plot • It gives a huge amount of information… (40,46] • Question to all: is this interesting? What of this is interesting? (46,52] (52,61] -37 (61,107] p-value = < 2.22e-16 | 21 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 DLR.de • Chart 22 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 Thank you for listening. Any questions? Yielding… Night Morning Afternoon Evening No -36.2 8.5 10.0 -10.4 259.3 -61.2 -71.8 74.5 Yes Peter Wagner Deutsches Zentrum für Luft- und Raumfahrt e.V. (DLR) German Aerospace Center | Institute of Transportation Systems Rutherfordstrasse 2 | 12489 Berlin | Germany +49 30 67055-237 | peter.wagner@dlr.de | DLR.de/ts DLR.de • Chart 23 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 DLR.de • Chart 24 > Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 Collision diagrams Robustness of the rank • Make the share • Plotted against the rank variable violin plots… in the full data-set • Can be done by subdividing the data, or by
Recommend
More recommend