How Many Is Too Much? ? Exploring Cos Costs ts of of Coor Coordinati tion on Du Duri ring Outages Dr. Laura M.D. Maguire Cognitive Systems Engineering Lab The Ohio State University @LauraMDMaguire @LauraMDMaguire
@LauraMDMaguire
@LauraMDMaguire
@LauraMDMaguire
@LauraMDMaguire Zoom
@LauraMDMaguire
(c) AleksandarNakic NASA @LauraMDMaguire
Software is increasingly managing critical societal functions 911 call routing systems Electronic health records Financial markets @LauraMDMaguire
Overview Changing nature of ‘control rooms’ & implications Cognitive & Coordinative work Coordination 4 interesting findings Implications for your work *Hint: You are probably going to want to rethink the need for an Incident Commander. @LauraMDMaguire
Cognitive work Perceiving Reasoning Attending Forming an Action @LauraMDMaguire
Cognitive work Anticipating Perceiving Observing Inferring Reasoning Recognizing change Reasoning Planning Prioritizing Troubleshooting Attending Diagnosing Correcting Forming Modifying Reacting an Action @LauraMDMaguire
Cognitive work Coordinative work Anticipating Recruiting Perceiving Observing Synchronizing Inferring Grounding Reasoning Updating Signaling Recognizing change Reasoning Taking Initiative Planning Prioritizing Delegating Troubleshooting Attending Taking Direction Diagnosing Reciprocity Correcting Forming Relaxing goals or Modifying constraints Reacting an Action @LauraMDMaguire
Cognitive costs of coordination – additional mental effort, load and delay required to participate in joint activity . @LauraMDMaguire
Wait… but then why y coordinate? • 24/7 ops • Geographically distributed • Dependencies • Specialized functions • Characterized by continuous change • Complex, interactive systems • Operating at speed & scale @LauraMDMaguire
Wait, but then why y coordinate? “Woods' Theorem: As • 24/7 ops the complexity of a • Geographically distributed system increases, the • Dependencies accuracy of any single • Specialized functions agent's own model of • Characterized by continuous change that system decreases • Complex, interactive systems rapidly.” • Operating at speed & scale -Stella report (stella.io) @LauraMDMaguire
Which people are important… @LauraMDMaguire
Which people are important… …in what collaborative interplay… @LauraMDMaguire
Which people are important… …in what collaborative interplay… …in what sequence? @LauraMDMaguire
The progression of an incident Cognitive demands Coordinative demands @LauraMDMaguire
The • In complex adaptive systems, everyone’s model coordination is going to be partial and incomplete (Woods, 2017). paradox @LauraMDMaguire
• In complex adaptive systems, everyone’s model is going to be partial and incomplete (Woods The 2017). coordination • Therefore we need multiple, diverse paradox perspectives to handle non-routine or exceptional events (Grayson, 2018, Watts-Perotti & Woods, 2001) . @LauraMDMaguire
• In complex adaptive systems, everyone’s model is going to be partial and incomplete (Woods 2017). The • Therefore we need multiple, diverse perspectives to handle non-routine or coordination exceptional events (Grayson, 2018, Watts-Perotti & Woods, 2001) . paradox • But there is additional cognitive load working with others (Klein et al, 2005; Maguire, 2019). @LauraMDMaguire
How to reap the benefits of joint activity without the costs of coordination becoming too The high? coordination What strategies do software engineers use to paradox control the costs of coordination? @LauraMDMaguire
What did I find? 1) Incident response 2) Incident command 3) Adaptation was key 4) Tooling can increase CoC @LauraMDMaguire
SNAFU Catchers Consortium Cycle 2 @LauraMDMaguire
Incident Response – a model Blogs.cisco.com @LauraMDMaguire
Incident Response – the hidden stuff I’m not sure Is this an You have new mail I don’t know what it is its actually incident? From: CEO yet but we need to over. We Anuj would To: Responder take action NOW. should Subject: WTF is going on??!? know. make sure How is the we don’t tech debt How is the burn out. from last tech debt incident from this going to incident impact us going to now? impact us later? I think I need help but I I need to get Not sure why that don’t want to Sarah, she can worked or how long wake anyone do this better up until I’m it will hold… I better than I can sure tell the other devs @LauraMDMaguire
Incident Response – the hidden stuff I’m not sure Is this an You have new mail I don’t know what it is its actually incident? From: CEO yet but we need to over. We Anuj would To: Responder take action NOW. should Subject: WTF is going on??!? know. make sure How is the we don’t tech debt How is the burn out. from this tech debt incident from this going to incident impact us going to now? impact us later? I think I need help but I I need to get Not sure why that don’t want to Sarah, she can worked or how long wake anyone do this better up until I’m it will hold… I better than I can sure tell the other devs @LauraMDMaguire
Incident Response – the hidden stuff I’m not sure Is this an You have new mail I don’t know what it is its actually incident? From: CEO yet but we need to over. We Anuj would To: Responder take action NOW. should Subject: WTF is going on??!? know. make sure How is the we don’t tech debt How is the burn out. from last tech debt incident from this going to incident impact us going to now? impact us later? I think I need help but I I need to get Not sure why that don’t want to Sarah, she can worked or how long wake anyone do this better up until I’m it will hold… I better than I can sure tell the other devs @LauraMDMaguire
Incident Response – the hidden stuff I’m not sure Is this an You have new mail I don’t know what it is its actually incident? From: CEO yet but we need to over. We Anuj would To: Responder take action NOW. should Subject: WTF is going on??!? know. make sure How is the we don’t tech debt How is the burn out. from last tech debt incident from this going to incident impact us going to now? impact us later? I think I need help but I I need to get Not sure why that don’t want to Sarah, she can worked or how long wake anyone do this better up until I’m it will hold… I better than I can sure tell the other devs @LauraMDMaguire
Incident Response – the hidden stuff I’m not sure Is this an You have new mail I don’t know what it is it is actually incident? From: CEO yet but we need to over. We Anuj would To: Responder take action NOW. should Subject: WTF is going on??!? know. make sure How is the we don’t tech debt How is the burn out. from last tech debt incident from this going to incident impact us going to now? impact us later? I think I need help but I I need to get Not sure why that don’t want to Sarah, she can worked or how long wake anyone do this better up until I’m it will hold… I better than I can sure tell the other devs @LauraMDMaguire
@LauraMDMaguire
“The incident commander holds the high-level state about the incident. They structure the incident response task force, assigning responsibilities according to need and priority. De facto , the commander holds all positions that they have not delegated.” Beyer et al (2016) @LauraMDMaguire
@LauraMDMaguire
t 0 @LauraMDMaguire
t 0 @LauraMDMaguire
t 0 @LauraMDMaguire
t 0 @LauraMDMaguire
Responder @LauraMDMaguire
Responder Responder Responder @LauraMDMaguire
@LauraMDMaguire
@LauraMDMaguire
@LauraMDMaguire
Adaptive Ch Ad Choreography. Dynamically reconfiguring how coordination happens. @LauraMDMaguire
@LauraMDMaguire
Taking Initiative Taking Direction Updating Recruiting others Sharing Info Being recruitable Deciding Backfilling IC tasks Anticipating Model Updating Adjusting Investing @LauraMDMaguire
Adaptive Ch Ad Choreography. Taking Initiative Taking Direction Updating Recruiting others Sharing Info Being recruitable Deciding Backfilling IC tasks Anticipating Model Updating Adjusting Investing @LauraMDMaguire
Which people/machines are important… …in what collaborative interplay… …in what sequence? @LauraMDMaguire
Costs of coordination with tooling • Lag/delay • Investments in: • Selecting • Reduced functionality • Testing • Glitches • Piloting • Updating • Launching • Calibration • Switching • Calibration • Difficulty with access • Re-calibrating • Limited observability • Working around limitations @LauraMDMaguire
What did I find? 1) Incident response has technical and coordinative demands 2) Incident command should be A role, not THE role 3) Adaptation was key 4) Tooling can increase costs of coordination. @LauraMDMaguire
Call to Action @LauraMDMaguire
Recommend
More recommend