process mining
play

Process Mining Tutorial Computational Intelligence in HealthCare 20 - PowerPoint PPT Presentation

Process Mining Tutorial Computational Intelligence in HealthCare 20 - 24 September 2010, Eindhoven, the Netherlands prof.dr.ir. Wil van der Aalst www.processmining.org Focus of most modeling and analysis techniques is on right-hand side


  1. extensions loaded every trace has a name every event has a name and a transition start of trace (i.e. classifier = name + transition process instance) name of trace resource timestamp name of event (activity name) transition PAGE 48

  2. end of trace (i.e. process instance) start of trace name of trace resource timestamp name of event (activity name) data associated to event PAGE 49 PAGE 49

  3. Example log case 1 : task A • Minimal information in log: case case 2 : task A id’s and task id’s. case 3 : task A • Additional information: event case 3 : task B type, time, resources, and data. case 1 : task B • Sequences: case 1 : task C • 1: ABCD case 2 : task C • 2: ACBD case 4 : task A case 2 : task B • 3: ABCD case 2 : task D • 4: ACBD case 5 : task E • 5: EF case 4 : task C • So this log there are three case 1 : task D possible sequences: case 3 : task C • ABCD case 3 : task D • ACBD case 4 : task B case 5 : task F • EF case 4 : task D PAGE 50

  4. >, → ,||,# relations case 1 : task A • Direct succession : x>y iff case 2 : task A case 3 : task A for some case x is directly case 3 : task B case 1 : task B case 1 : task C followed by y. case 2 : task C case 4 : task A case 2 : task B • Causality : x → y iff x>y and case 2 : task D case 5 : task E ABCD case 4 : task C not y>x. case 1 : task D ACBD case 3 : task C case 3 : task D EF • Parallel : x||y iff x>y and case 4 : task B case 5 : task F case 4 : task D y>x • Choice : x#y iff not x>y and A>B A → B A>C not y>x. A → C B>C B||C B → D B>D C||B C → D C>B C>D E → F E>F PAGE 51

  5. Basic idea (1) x y x → y PAGE 52

  6. Basic idea (2) y x z x → y, x → z, and y||z PAGE 53

  7. Basic idea (3) y x z x → y, x → z, and y#z PAGE 54

  8. Basic idea (4) x z y x → z, y → z, and x||y PAGE 55

  9. Basic idea (5) x z y x → z, y → z, and x#y PAGE 56

  10. It is not that simple! Basic Alpha algorithm Let W be a workflow log over T. α (W) is defined as follows. 1. T W = { t ∈ T | ∃ σ ∈ W t ∈ σ }, 2. T I = { t ∈ T | ∃ σ ∈ W t = first ( σ ) }, 3. T O = { t ∈ T | ∃ σ ∈ W t = last ( σ ) }, 4. X W = { (A,B) | A ⊆ T W ∧ A ≠ ø ∧ B ⊆ T W ∧ B ≠ ø ∧ ∀ a ∈ A ∀ b ∈ B a → W b ∧ ∀ a1,a2 ∈ A a 1 # W a 2 ∧ ∀ b1,b2 ∈ B b 1 # W b 2 }, 5. Y W = { (A,B) ∈ X | ∀ (A ′ ,B ′ ) ∈ X A ⊆ A ′ ∧ B ⊆ B ′⇒ (A,B) = (A ′ ,B ′ ) }, 6. P W = { p (A,B) | (A,B) ∈ Y W } ∪ {i W ,o W }, 7. F W = { (a,p (A,B) ) | (A,B) ∈ Y W ∧ a ∈ A } ∪ { (p (A,B) ,b) | (A,B) ∈ Y W ∧ b ∈ B } ∪ { (i W ,t) | t ∈ T I } ∪ { (t,o W ) | t ∈ T O }, and 8. α (W) = (P W ,T W ,F W ). PAGE 57

  11. Example revisited W: case case 1 1 : t : task ask A A case case 2 2 : t : task ask A A case 3 case 3 : t : task ask A A case case 3 3 : t : task ask B B case case 1 1 : t : task ask B B case 1 case 1 : t : task ask C C case case 2 2 : t : task ask C C case case 4 4 : t : task ask A A α (W) case case 2 2 : t : task ask B B case case 2 2 : t : task ask D D B case 5 case 5 : t : task ask E E case 4 case 4 : t : task ask C C case case 1 1 : t : task ask D D case case 3 3 : t : task ask C C A D case case 3 3 : t : task ask D D case case 4 4 : t : task ask B B case case 5 5 : t : task ask F F case case 4 4 : t : task ask D D C A → B A>B A → C A>C B → D B>C C → D B>D E F E → F C>B C>D E>F B||C C||B PAGE 58

  12. Exercise (1) • What does the Alpha algorithm produce for a log consisting only of the following traces? • ABCD • ACBD Let W be a workflow log over T. α (W) is defined as follows. 1. T W = { t ∈ T | ∃ σ ∈ W t ∈ σ }, • AED 2. T I = { t ∈ T | ∃ σ ∈ W t = first ( σ ) }, 3. T O = { t ∈ T | ∃ σ ∈ W t = last ( σ ) }, 4. X W = { (A,B) | A ⊆ T W ∧ A ≠ ø ∧ B ⊆ T W ∧ B ≠ ø ∧ ∀ a ∈ A ∀ b ∈ B a → W b ∧ ∀ a1,a2 ∈ A a 1 # W a 2 ∧ ∀ b1,b2 ∈ B b 1 # W b 2 }, 5. Y W = { (A,B) ∈ X | ∀ (A ′ ,B ′ ) ∈ X A ⊆ A ′ ∧ B ⊆ B ′⇒ (A,B) = • Direct succession: x>y iff for (A ′ ,B ′ ) }, some case x is directly followed by 6. P W = { p (A,B) | (A,B) ∈ Y W } ∪ {i W ,o W }, y. • Causality: x → y iff x>y and not y>x. 7. F W = { (a,p (A,B) ) | (A,B) ∈ Y W ∧ a ∈ A } ∪ { • Parallel: x||y iff x>y and y>x (p (A,B) ,b) | (A,B) ∈ Y W ∧ b ∈ B } ∪ { (i W ,t) | t ∈ T I } ∪ { • Choice: x#y iff not x>y and not (t,o W ) | t ∈ T O }, and y>x. 8. α (W) = (P W ,T W ,F W ). PAGE 59

  13. Another example taken step-by-step ... PAGE 60

  14. A → B A>B A → C A>C A → E A>E B → D B>C C → D D>D E → D C>B C>D B||C E>D C||B PAGE 61

  15. PAGE 62

  16. # # A and B need to be non-empty. A → B A>B A → C A>C A → E A>E B → D B>C C → D D>D E → D C>B C>D B||C E>D C||B PAGE 63

  17. PAGE 64

  18. PAGE 65

  19. Exercise (2) • What does the Alpha algorithm produce for a log consisting only of the following traces? • ACD Let W be a workflow log over T. α (W) is defined as follows. • BCE 1. T W = { t ∈ T | ∃ σ ∈ W t ∈ σ }, 2. T I = { t ∈ T | ∃ σ ∈ W t = first ( σ ) }, 3. T O = { t ∈ T | ∃ σ ∈ W t = last ( σ ) }, 4. X W = { (A,B) | A ⊆ T W ∧ A ≠ ø ∧ B ⊆ T W ∧ B ≠ ø ∧ ∀ a ∈ A ∀ b ∈ B a → W b ∧ ∀ a1,a2 ∈ A a 1 # W a 2 ∧ ∀ b1,b2 ∈ B b 1 # W b 2 }, 5. Y W = { (A,B) ∈ X | ∀ (A ′ ,B ′ ) ∈ X A ⊆ A ′ ∧ B ⊆ B ′⇒ (A,B) = • Direct succession: x>y iff for (A ′ ,B ′ ) }, some case x is directly followed by 6. P W = { p (A,B) | (A,B) ∈ Y W } ∪ {i W ,o W }, y. • Causality: x → y iff x>y and not y>x. 7. F W = { (a,p (A,B) ) | (A,B) ∈ Y W ∧ a ∈ A } ∪ { • Parallel: x||y iff x>y and y>x (p (A,B) ,b) | (A,B) ∈ Y W ∧ b ∈ B } ∪ { (i W ,t) | t ∈ T I } ∪ { • Choice: x#y iff not x>y and not (t,o W ) | t ∈ T O }, and y>x. 8. α (W) = (P W ,T W ,F W ). PAGE 66

  20. Exercise (3) • What does the Alpha algorithm produce for a log consisting only of the following traces? • ACEG Let W be a workflow log over T. α (W) is defined as follows. • AECG 1. T W = { t ∈ T | ∃ σ ∈ W t ∈ σ }, 2. T I = { t ∈ T | ∃ σ ∈ W t = first ( σ ) }, • BDFG 3. T O = { t ∈ T | ∃ σ ∈ W t = last ( σ ) }, • BFDG 4. X W = { (A,B) | A ⊆ T W ∧ A ≠ ø ∧ B ⊆ T W ∧ B ≠ ø ∧ ∀ a ∈ A ∀ b ∈ B a → W b ∧ ∀ a1,a2 ∈ A a 1 # W a 2 ∧ ∀ b1,b2 ∈ B b 1 # W b 2 }, 5. Y W = { (A,B) ∈ X | ∀ (A ′ ,B ′ ) ∈ X A ⊆ A ′ ∧ B ⊆ B ′⇒ (A,B) = • Direct succession: x>y iff for (A ′ ,B ′ ) }, some case x is directly followed by 6. P W = { p (A,B) | (A,B) ∈ Y W } ∪ {i W ,o W }, y. • Causality: x → y iff x>y and not y>x. 7. F W = { (a,p (A,B) ) | (A,B) ∈ Y W ∧ a ∈ A } ∪ { • Parallel: x||y iff x>y and y>x (p (A,B) ,b) | (A,B) ∈ Y W ∧ b ∈ B } ∪ { (i W ,t) | t ∈ T I } ∪ { • Choice: x#y iff not x>y and not (t,o W ) | t ∈ T O }, and y>x. 8. α (W) = (P W ,T W ,F W ). PAGE 67

  21. More on Process Discovery

  22. Examples of process discovery techniques • Algorithmic techniques • Alpha miner • Alpha+, Alpha++, Alpha# • FSM miner • Fuzzy miner • Heuristic miner • Multi phase miner • Genetic process mining • Single/duplicate tasks • Distributed GM • Region-based process mining • State-based regions • Language based regions • Classical approaches not dealing with concurrency • Inductive inference (Mark Gold, Dana Angluin et al.) Sequence mining • PAGE 69

  23. Genetic Mining (Ana Karla Alves de Medeiros et al.) 1. initial population B E H A J M C F I L 6. mutation 7. new population 2. fitness test 5. children D 4. crossover B E H K A J M 3. select best parents C F I L H K B E H K A J M A J M C F I L C F I L PAGE 70

  24. Design choices representation 1. initial population fitness 6. mutation 7. new population 2. fitness test mutation 5. children 4. crossover 3. select best parents crossover PAGE 71

  25. Properties of Genetic Mining • Requires a lot of computing power. • Can deal with noise, infrequent behavior, duplicate tasks, invisible tasks, etc. • Allows for incremental improvement and combinations with other approaches (heuristics post-optimization, etc.). PAGE 72

  26. Challenge: Balancing Between Underfitting and Overfitting PAGE 73

  27. The essence ABCD ACBD B AED ABCD A E D ABCD AED ACBD C ... PAGE 74

  28. But ... B Any log A C containg activities start end A, B, C, D, and E. D E PAGE 75

  29. Finding a balance (c) A D ACD BCE C ... B E (a) more more behavior behavior A D ACD ACE BCE C BCD ... (d) B E (b) PAGE 76

  30. A D C ACD 99 B E ACE 0 BCE 85 A D BCD 0 C B E PAGE 77

  31. A D C ACD 99 B E ACE 88 BCE 85 A D BCD 78 C B E PAGE 78

  32. A D C ACD 99 B E ACE 2 BCE 85 A D BCD 3 C B E PAGE 79

  33. Evaluating process mining results Fitness: Is the event log possible according to the model? Precision: Is the model Generalization: Is the model not underfitting (allow for not overfitting (only allow for too much)? the “accidental” examples)? Structure: Is this the simplest model (Occam's Razor)? PAGE 80

  34. PAGE 81

  35. Representing process models PAGE 82

  36. B Need for trip has arisen Entry of a travel request A E Trip is requested Approval C of travel request Need Planned Planned to correct trip trip planned is rejected is approved trip is transmitted D Advance payment Trip Unrequested Approved advance trip trip is transmitted/ has taken has taken paid place place Entry of trip facts Trip facts and receipts have been released for checking Approval of trip facts Planned Trip Trip Approval trip expenses facts of trip must reimbursement are released facts be canceled is rejected for accounting is transmitted Accounting date is reached Travel Expenses Trip Payment Trip Amounts Amounts Trip expenses amount costs Payments Payment relevant liable costs reimbursement transmitted must must must to accounting to employment statement must to bank/ be included be released be effected transmitted tax transmitted is transmitted be canceled payee in cost accounting to payroll accounting to payroll Cancellation Trip costs Trip cancelation is canceled statement PAGE 83 is transmitted

  37. More significant nodes are emphasized Highlights more important paths PAGE 84

  38. More to learn from maps... Abstraction Aggregation Removing isolated, less Clustering of coherent, significant structures less significant structures PAGE 85

  39. Fuzzy miner PAGE 86

  40. Showing reality PAGE 87

  41. Back to the future …

  42. supports/ “world” controls business software processes system people machines components organizations records events, e.g., messages, specifies transactions, models configures etc. analyzes implements analyzes discovery (process) event conformance model logs extension PAGE 89

  43. Rec ecomme ommend: How ow to to get home ASAP et home ASAP? Take Take a a lef eft tu t turn! Detec etect: You You d drive too ve too fas ast! t! Pre redi dict ct: When wil will I b be h home? ? At 1 11.26! PAGE 90

  44. Operational Support: Detect, Predict, and Recommend detect alerts current predict predictions data recommend recommendations (simulation) models learn (discover and enhance) historic data PAGE 91

  45. Operational Support and Conformanc Checking Based on Replay

  46. Play Out (Classical use of models) B p1 p3 A E D start end p2 p4 C A E D A E D A B C D A C B D A B C D A C B D A E D A C B D PAGE 93

  47. Play In (Process Discovery) ABCD ACBD a process discovery AED algorithm like the α ACBD algorithm AED ABCD … B p1 p3 A E D start end p2 p4 C PAGE 94

  48. Replay A B C D B p1 p3 A E D start end p2 p4 C PAGE 95

  49. Replay can detect problems AC D Problem! Problem! B token left behind missing token p1 p3 A E D start end p2 p4 C PAGE 96

  50. Replay can extract timing information A 5 B 8 C 9 D 13 8 5 6 7 4 3 B 5 2 8 p1 p3 A E D start end 13 5 4 p2 p4 4 3 C 4 3 7 7 6 9 PAGE 97

  51. Example: Conformance Checker

  52. Conformance checker (Anne Rozinat et al.) How to quantify this? PAGE 99

  53. Fitness by replay m=missing,r=remaining,c=consumed,p=produced PAGE 100

  54. No problem (m=0, r=0) PAGE 101

  55. Another (impossible) trace PAGE 102

  56. PAGE 103

Recommend


More recommend