Three Ways to make your Industrial Data Science Projects a Success - PowerPoint PPT Presentation

Three Ways to make your Industrial Data Science Projects a Success Prof. Dr.-Ing. Jochen Deuse IDS 2019 Institute of Production Systems, Leonhard-Euler-Str. 5, D-44227 Dortmund

Defining the Process  Defining the Process  Dealing with Data Immaturity  Combining Domain Knowledge with Data Science  Conclusion Prof. Dr.-Ing. Jochen Deuse 29.04.2019 2

There is a Variety of Knowledge Discovery Processes SEMMA of SAS Knowledge Discovery in Databases (KDD) Knowledge Discovery in Industrial Databases (KDID) Cross-Industry Standard Process for Data Mining (CRISP-DM) Prof. Dr.-Ing. Jochen Deuse 29.04.2019 3

So why do we follow CRISP-DM?  From a domain expert‘s perspective,  It resembles a PDCA- the process is very intuitive respectively a DMAIC-circle   It provides a well defined It can easily be adapted project structure across different industries Prof. Dr.-Ing. Jochen Deuse 29.04.2019 4

CRISP-DM provides a well defined Project Structure SMD-Value Stream: Shortening quality control loops Exploring process and inspection data and reducing the need for X-Ray inspection Business Data Understanding Understanding Data Preparation Deployment Data Modeling Deploying based on IoT-Architecture Aggregating and cleansing of data Evaluation Optimising slack rate and pseudo faults Selecting and configuring suitable prediction models Prof. Dr.-Ing. Jochen Deuse 29.04.2019 5

Dealing with Data Immaturity  Defining the Process  Dealing with Data Immaturity  Combining Domain Knowledge with Data Science  Conclusion Prof. Dr.-Ing. Jochen Deuse 29.04.2019 6

Data Maturity can be assessed by applying a defined Set of Criteria  Data Acquisition – How is data collected along the value stream?  Sample Size – Are there enough representatives of each class and are they evenly distributed?  Reference Level – Is the data available in a high and uniform granularity?  Consistency – Does the relevant data set contain logical contradictions?  Traceability – Can label and feature value characteristics be joined unambiguously?  … Prof. Dr.-Ing. Jochen Deuse 29.04.2019 7

We have specified ten Criteria and four Levels of Maturity each Maturity level Criteria 1 2 3 4 electronical, must be triggered data acquisition is carried out fully automated Data collection manual entry manually automatically in most cases data collection Completeness of unilateral and incomplete recording of recording of the essential recording of a large part of the recording of all relevant, data collection relevant characteristics characteristics relevant characteristics (un)influenceable characteristics large sample per object group, but large sample with large number per Sample size no historic data small sample per object group unbalanced data object group and class decentralised data storage with different data management systems Data sources paperbased records comprehensive Data Warehouse simple software (e.g. Excel) with central data storage formats that are difficult to process formats with limited processability different, directly processable Data format comprehensive standard format (e.g. scans, photos) (e.g. PDF) formats (e.g. CSV, XML) unstructured text semi-structured data structured, structured, metrically scaled data Data structure or images (e.g. XML, JSON) mixed-scaled data and standardized codes aggregated actual values or raw Feature type only set points highly aggregated actual values raw data in real time data with low sampling rate value characteristics at the highest value characteristics at the upper value characteristics at the next value characteristics at individual Reference level reference level reference level higher level element level massive amount of logical no consistency/integrity full integrity/consistency Consistency of data few logical differences differences comprehensive ID/ timestamp on no ID/ time stamp different ID/ timestamp comprehensive ID/ time stamp Traceability same reference level Reference: Eickelmann et al. (2019): Bewertungsmodell zur Analyse der Datenreife. Prof. Dr.-Ing. Jochen Deuse 29.04.2019 8 In: ZWF Jg. 114, 1-2, S. 29-33

Non uniform Reference Levels prohibit Supervised Learning Maturity level Criteria 1 2 3 4 electronical, must be triggered data acquisition is carried out fully automated Data collection manual entry manually automatically in most cases data collection Completeness of unilateral and incomplete recording of recording of the essential recording of a large part of the recording of all relevant, data collection relevant characteristics characteristics relevant characteristics (un)influenceable characteristics large sample per object group, but large sample with large number per Sample size no historic data small sample per object group unbalanced data object group and class decentralised data storage with different data management systems Data sources paperbased records comprehensive Data Warehouse simple software (e.g. Excel) with central data storage formats that are difficult to process formats with limited processability different, directly processable Data format comprehensive standard format (e.g. scans, photos) (e.g. PDF) formats (e.g. CSV, XML) unstructured text semi-structured data structured, structured, metrically scaled data Data structure or images (e.g. XML, JSON) mixed-scaled data and standardized codes aggregated actual values or raw Feature type only set points highly aggregated actual values raw data in real time data with low sampling rate value characteristics at the highest value characteristics at the upper value characteristics at the next value characteristics at individual Reference level reference level reference level higher level element level massive amount of logical no consistency/integrity full integrity/consistency Consistency of data few logical differences differences comprehensive ID/ timestamp on no ID/ time stamp different ID/ timestamp comprehensive ID/ time stamp Traceability same reference level Prof. Dr.-Ing. Jochen Deuse 29.04.2019 9

Supervised Learning of Quality Labels from End-of-Line-Test Data  Diesel Injector Nozzle Manufacturing Value Stream Assembly and injector testing True result Pressing rings Assembly Screw Screw NOK OK and filters pressure pressure control 3.018 (4,42 %) 65.302 (95,58 %) Screw injection Precision Pseudo NOK 635 fault rate 1.021 1.656 Pseudo faults Leak test (2,42 %) 61,65 % 38,35 % Forecast Ring Quality Clamping Packaging assembly inspection station False Negative OK 1.997 omission rate predictive value 64.667 66.664 Slack (97,58 %) 3,00 % 97,00 % Hydraulic end-of-line-test Feature 1 Feature 2 Sensitivity / Recall False Positive Rate Feature 3 33,83 % 0,97 % Feature Value Accuracy 96,15 % Slack rate Specificity 66,17 % 99,03 % Time [s] P5 P1 P2 P3 P4 P6 Prof. Dr.-Ing. Jochen Deuse Sponsored by: 29.04.2019 10

Unbalanced Label Proportions result in high Recall Rates Maturity level Criteria 1 2 3 4 electronical, must be triggered data acquisition is carried out fully automated Data collection manual entry manually automatically in most cases data collection Completeness of unilateral and incomplete recording of recording of the essential recording of a large part of the recording of all relevant, data collection relevant characteristics characteristics relevant characteristics (un)influenceable characteristics large sample per object group, but large sample with large number per Sample size no historic data small sample per object group unbalanced data object group and class decentralised data storage with different data management systems Data sources paperbased records comprehensive Data Warehouse simple software (e.g. Excel) with central data storage formats that are difficult to process formats with limited processability different, directly processable Data format comprehensive standard format (e.g. scans, photos) (e.g. PDF) formats (e.g. CSV, XML) unstructured text semi-structured data structured, structured, metrically scaled data Data structure or images (e.g. XML, JSON) mixed-scaled data and standardized codes aggregated actual values or raw Feature type only set points highly aggregated actual values raw data in real time data with low sampling rate value characteristics at the highest value characteristics at the upper value characteristics at the next value characteristics at individual Reference level reference level reference level higher level element level massive amount of logical no consistency/integrity full integrity/consistency Consistency of data few logical differences differences comprehensive ID/ timestamp on no ID/ time stamp different ID/ timestamp comprehensive ID/ time stamp Traceability same reference level Prof. Dr.-Ing. Jochen Deuse 29.04.2019 11

Three Ways to make your Industrial Data Science Projects a Success - PowerPoint PPT Presentation

Three Ways to make your Industrial Data Science Projects a Success Prof. Dr.-Ing. Jochen Deuse IDS 2019 Institute of Production Systems, Leonhard-Euler-Str. 5, D-44227 Dortmund Defining the Process Defining the Process Dealing with

??? It s It s Make Your Mind Up Make Your Mind Up Time Time ??? Make

Isaiah 55:8-9 8. For My thoughts are not your thoughts, neither are your ways My ways, declares

Molecular mechanisms of angiogenesis Three ways of formation of blood vessels Three ways of

Make your Insurance Work For YOU 4 Ways Of Ways Of Losing Our Income Losing Our Income 25%

? Can Functional Programmers Make make Make Sense? Norman Ramsey Tufts

Digital Marketing & Social Media: 6 ways to make your current website perform better Know

MACBETH Revision Day Slides ACT 1 SCENE 1 Note the significance of the number three: three

How to make beautiful 3D text How How How to to to make make make beautiful beautiful

INPUT OUTPUT PROCESSING Make appe2zer Make salad Make

Follow Your Soul CO-CREATE YOUR DREAM LIFE MODULE THREE Welcome! Welcome to Module Three! u

NEW WAYS a research program to identify, treat and support individuals with common mental

Five ways not to fool yourself Tim Harris 23-Jun-18 Five ways not to fool yourself A

Welcome to the Festival of Learning Step 1 Make sure that your PC Make sure that your

ESCHEATS COLLECTION Three Ways to Obtain Unclaimed Property Lorie Malone, Durham County Three

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

your marketing on Google social. Make your marketing social.Add a + What is Google+? It s a

The FIFE Project: Computing for Experiments Ken Herner for the FIFE Project DPF 2017 3 August

Discussing the relations of EGI and IGE EGI Technical Forum Amsterdam, 16/09/10 Anton Frank

Xcrypt Highly-Product ive Parallel Script Language Hiroshi Nakashima (ACCMS, Kyot o U.) who

Lead Talent Development Follow Up Slides ATDps - November 2015 Chapter Meeting Mary Alida Brisk,

Lean logistics, lessons learnt from Japan Adrian Blumenthal, Special Projects Director,

Wim Peeters PBDKO vzw (Belgium) Abstract In Flanders (Belgium) secondary schools are responsible

PGCon 2020 Tatsuro Yamada Julien Rouhaud Who we are Tatsuro Yamada Works for NTT Comware as

Deep Neural Networks for PDEs Philipp Grohs DL and Vis, September 2018 Short Reading List 1 Ian

Three Ways to make your Industrial Data Science Projects a Success - PowerPoint PPT Presentation

Three Ways to make your Industrial Data Science Projects a Success Prof. Dr.-Ing. Jochen Deuse IDS 2019 Institute of Production Systems, Leonhard-Euler-Str. 5, D-44227 Dortmund Defining the Process Defining the Process Dealing with

??? It s It s Make Your Mind Up Make Your Mind Up Time Time ??? Make

Isaiah 55:8-9 8. For My thoughts are not your thoughts, neither are your ways My ways, declares

Molecular mechanisms of angiogenesis Three ways of formation of blood vessels Three ways of

Make your Insurance Work For YOU 4 Ways Of Ways Of Losing Our Income Losing Our Income 25%

? Can Functional Programmers Make make Make Sense? Norman Ramsey Tufts

Digital Marketing &amp; Social Media: 6 ways to make your current website perform better Know

MACBETH Revision Day Slides ACT 1 SCENE 1 Note the significance of the number three: three

How to make beautiful 3D text How How How to to to make make make beautiful beautiful

INPUT OUTPUT PROCESSING Make appe2zer Make salad Make

Follow Your Soul CO-CREATE YOUR DREAM LIFE MODULE THREE Welcome! Welcome to Module Three! u

NEW WAYS a research program to identify, treat and support individuals with common mental

Five ways not to fool yourself Tim Harris 23-Jun-18 Five ways not to fool yourself A

Welcome to the Festival of Learning Step 1 Make sure that your PC Make sure that your

ESCHEATS COLLECTION Three Ways to Obtain Unclaimed Property Lorie Malone, Durham County Three

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

your marketing on Google social. Make your marketing social.Add a + What is Google+? It s a

The FIFE Project: Computing for Experiments Ken Herner for the FIFE Project DPF 2017 3 August

Discussing the relations of EGI and IGE EGI Technical Forum Amsterdam, 16/09/10 Anton Frank

Xcrypt Highly-Product ive Parallel Script Language Hiroshi Nakashima (ACCMS, Kyot o U.) who

Lead Talent Development Follow Up Slides ATDps - November 2015 Chapter Meeting Mary Alida Brisk,

Lean logistics, lessons learnt from Japan Adrian Blumenthal, Special Projects Director,

Wim Peeters PBDKO vzw (Belgium) Abstract In Flanders (Belgium) secondary schools are responsible

PGCon 2020 Tatsuro Yamada Julien Rouhaud Who we are Tatsuro Yamada Works for NTT Comware as

Deep Neural Networks for PDEs Philipp Grohs DL and Vis, September 2018 Short Reading List 1 Ian

Digital Marketing & Social Media: 6 ways to make your current website perform better Know