Keeping Master Green at Scale Sundaram Ananthanarayanan , Masoud Saeida Ardekani, Denis Haenikel, Balaji Varadarajan, Simon Soriano, Dhaval Patel, Ali-Reza Adl-Tabatabai (https://eng.uber.com/research/keeping-master-green-at-scale/)
Monorepo is popular! • Single, shared repo hosting companies’ software assets Monorepo! Multirepo Advantages of a Monorepo [Ciera et al. @ICSE’18] ✔ Simplified Dependency Management ✔ Improved Code Visibility
Always green master considered hard • Monorepos handle a huge volume of commits every day • Existing CI workflows do not guarantee an always green master ‐ Too hard at scale • Submit Queue guarantees an always-green master at scale
Outline 01 Why green master is hard 02 Probabilistic Speculation 03 Conflict Analyzer 04 Evaluation
Lifecycle of a change in monorepo Monorepo Change Peer Review Developer Revision Developer BUILD ✅ CI Server Change Change TEST Revision Revision RESULT
Challenge: Concurrent conflicting changes Alice Bob C 1 C 2 master
Challenge: Concurrent conflicting changes Alice Bob C 1 C 2 master C 1 C 2 build steps fail
Example of a real conflict
How often conflicts happen?
How often conflicts happen? Observation: Chances of a conflict ↑ from 5% to 40% as #. of concurrent & potentially conflicting changes ↑
Drawbacks of a red master Delayed rollouts Hampered Productivity Complex rollbacks
Keeping master green: Queue Alice C 3 C 2 C 1 Bob Carol master H Alice, Bob, Carol enqueue changes they want to commit
Keeping master green: Queue Alice C 3 C 2 C 1 Bob Carol master H C 1 is built and tested against mainline head (H).
Keeping master green: Queue Alice C 3 C 2 C 1 Bob Carol master H Build steps for H ⊕ C 1 succeed.
Keeping master green: Queue Alice C 3 C 2 Bob Carol master H C 1 is committed and it becomes the head. C 2 is tested against it.
Keeping master green: Queue Alice C 3 C 2 Bob Carol master H H Build steps for H ⊕ C 2 fails and C 2 is rejected.
Keeping master green: Queue Alice C 3 Bob Carol master H H ✔ Guarantees an always green master by serializing changes ✖ Does not scale to 1000s of changes/day
Keeping master green: Batching changes Alice C 3 ` C 2 C 1 Bob Carol master H C 1 and C 2 are batched and build steps are run.
Keeping master green: Batching changes Alice C 3 ` C 2 C 1 Bob Carol master H ✔ Improves the throughput if batches succeed more often than not ✖ Testing batches masks intermediate changes that fail ✖ Batches will fail often as the size of the batch increases What happens when batches fail?
Keeping master green: Goals Guarantee serializability Provide reasonable SLAs • Illusion of a single queue when committing • Overheads should be short enough for changes developers to trade speed for correctness! • Git only offers serializability of patches Challenge: how to do this at scale? (1000s of commits/day)
Submit Queue: Overview Speculation Engine Conflict Analyzer Planner Engine • Speculates on success/failure • Determines independent • Selects most valuable builds of changes changes from speculation engine • Builds speculation graph • Constructs conflict graph • Execute builds and commit changes
Speculation Tree C 3 C 2 C 1 C 1 , C 2 , C 3 - pending changes
Speculation Tree B 1 C 3 C 2 C 1 B 1 : Build Steps for H ⨁ C 1
Speculation Tree B 1 C 3 C 2 C 1 B 1 fails → C 1 rejected B 1 succeeds → C 1 commits B 2 B 1.2 B 2 : Build Steps for H ⨁ C 2 B 1.2 : Build C 2 against (H ⨁ C 1 ) 1. Precompute the outcome of committing C 2 under different realities 2. Commit or reject C 2 based on the outcome of B 1 and one of {B 2 , B 1.2 }
Speculation Tree B 1 C 3 C 2 C 1 B 1 fails → C 1 rejected B 1 succeeds → C 1 commits B 2 B 1.2 B 2 fails → C 2 rejected B 1.2 fails → C 2 rejected B 2 succeeds → C 2 commits B 1.2 succeeds → C 2 commits B 3 B 2.3 B 1.3 B 1.2.3 Challenge: Which builds to run?
Approach #1: Speculate Them All Speculate on all possible outcomes equally C 3 C 2 C 1 ● Selects builds in a breadth-first order B 1 Does not scale for 1000s of changes/day ● Need to run 2 n builds in parallel to commit ‘n’ changes B 2 B 12 B 3 B 23 B 13 B 123 Leads to substantial waste of resources
Speculate Them All: Resource Wastage C 3 C 2 C 1 B 1 B 2 B 12 B 3 B 23 B 13 B 123
Speculate Them All: Observation If we select and execute builds whose outcomes are most likely to be needed , then we require only n (out of 2 n ) builds . Challenge: Which ‘n’ builds are likely to be needed?
Probabilistic Speculation B 1 B 2 B 1.2 B 3 B 2.3 B 1.3 B 1.2.3 represents the prob. the result of the build B C is used to make to commit/reject C.
Probabilistic Speculation B 1 B 2 B 1.2 B 3 B 2.3 B 1.3 B 1.2.3 Root B 1 is always needed as is used to determine if C 1 can be committed
Probabilistic Speculation B 1 B 2 B 1.2 B 3 B 2.3 B 1.3 B 1.2.3 represents the prob. that change C 1 succeeds individually
Probabilistic Speculation B 1 B 2 B 1.2 B 3 B 2.3 B 1.3 B 1.2.3 represents the prob. that change C 1 succeeds individually
Probabilistic Speculation B 1 B 1.2 : Build C 2 against (H ⨁ C 1 ) B 2 B 1.2 B 3 B 2.3 B 1.3 B 1.2.3 represents the prob. that C 2 conflicts with C 1
Probabilistic Speculation B 1 B 2 B 1.2 B 3 B 2.3 B 1.3 B 1.2.3
Probabilistic Speculation: Summary Choose most valuable builds by determining • Probability of success of a change • Probability of a conflict bet. changes B 1 B 2 B 12 B 3 B 23 B 13 B 123
Evaluating and Logistic regression to train prediction models ● ○ Feature set includes 100+ hand-picked features ○ Prediction accuracy of 97% Change Developer Speculation ● # affected targets ● developer name ● dynamic features to re-adjust ● # git commits ● employment proficiencies weights based on initial predictions ● # files changed ● # speculations succeeded ● status of pre-submit checks ● # speculations failed
Features for Training ML Models Change Revision ● # affected targets ● revision is a container for changes ● # git commits ● # changes submitted ● # files changed ● revert and test plans ● status of pre-submit checks ● # Submit attempts made Developer Speculation ● developer name ● dynamic features to re-adjust weights ● employment proficiencies based on initial predictions ● # speculations succeeded ● # speculations failed
Conflict Analyzer ● So far, we assumed all changes potentially conflict with each other ○ Cannot commit in parallel ● What if changes can be proved to be independent? ○ Commit changes in parallel ○ Trim speculation space ● We use Conflict Analyzer to find independent changes
Conflict Analyzer: Commit Changes in Parallel C 1 C 2 C 3 Conflict graph for changes C 1 , C 2 , C 3 where C 1 and C 2 are independent and conflict with C 3 .
Conflict Analyzer: Commit Changes in Parallel B 1 B 2 C 1 C 2 B 1 succeeds B 2 fails B 1 fails B 2 succeeds B 1.3 C 3 B 3 B 2.3 B 1.2.3 Insight: Changes C 1 and C 2 can be committed in parallel.
Conflict Analyzer: Trim Speculation Space C 1 C 2 C 3 Conflict graph for C 1 , C 2 , C 3 where C 1 conflicts with independent changes C 2 and C 3 .
Conflict Analyzer: Trim Speculation Space B 1 C 1 B 1 fails B 1 succeeds B 3 B 1.2 B 1.3 C 2 C 3 B 2 Insight: Because C 3 does not speculate on C 2 , # of possible builds for C 3 reduces to 2.
Conflict Analyzer: Detecting conflicts at scale Build system to detect if changes are ● T 1 main.exe independent Code partitioned into smaller entities ● T 2 T 3 util.o main.o called targets Every change affects a set of targets ● util.c util.h main.c Example build graph
Detecting Conflicts: Intuition Two changes are independent if they affect a disjoint set of targets.
Detecting Conflicts: Build Graph for H ⊕ C 1 Example Target Y Target Z Applying C 1 Original Build Graph for H Target X Target Y Target Z Applying C 2 Build Graph for H ⊕ C 2 Target X Target Y Target Z Target X
Detecting Conflicts: Build Graph for H ⊕ C 1 Puzzle Target Y Target Z Applying C 1 Original Build Graph for H Target X Target Y Target Z Applying C 2 Build Graph for H ⊕ C 2 Target X Target Y Target Z ● C 1 and C 2 are conflicting ● But, the intersection of affected targets is empty! Target X
Detecting Conflicts: Build Graph for H ⊕ C 1 Composition 5 Target Y 3 Target Z {(x, 4), (y, 5)} Applying C 1 4 Target X Original Build Graph for H Build Graph for H ⊕ C 2 2 3 Target Y Target Z Applying C 2 2 Target Y 6 Target Z {(z, 6)} Applying C 1 ⊕ C 2 1 Target X 1 Target X Build Graph for H ⊕ C 1 ⊕ C 2 5 Target Y 7 Target Z {(x, 4), (y, 5)} ∪ {(z, 6)} ≠ {(x, 4), (y, 5), (z, 7)} {(x, 4), (y, 5), Thus, C 1 and C 2 are conflicting! (z,7)} 4 Target X
Detecting Conflicts: Summary • Intersection Approach ✖ Does not detect all kinds of conflicts • Union Approach ✖ Determining conflicts for n changes requires n 2 build graphs! • Hybrid Approach ✔ Only 7.9% of changes cause a change to the build graph • Union Graph Approach (details in paper)
Recommend
More recommend