112 quotes amp text
play

112 quotes & text 1920x1080 72 URLs & citations 72 - PowerPoint PPT Presentation

112 quotes & text 1920x1080 72 URLs & citations 72 code{:;} 36 credits Growing Pains Software Repositories at SCALE Do you put all of your bits in a single gigantic repository or many smaller ones? Why are we even asking?


  1. 112 quotes & text 1920x1080 72 URLs & citations 72 ¡code{:;} ¡ 36 credits

  2. Growing Pains Software Repositories at SCALE

  3. Do you put all of your bits in a single gigantic repository or many smaller ones?

  4. Why are we even asking? • Ten years ago most people were using centralized SCMs. • Nature of Software Development has changed. • Software projects have become more complicated. • More outsourcing and partnering.

  5. Outline • Some historical context. • Kinds of SCMs. • Advantages and disadvantages of Monorepo & Multirepo. • What serves you?

  6. monotone fossil 2003 2007 Arch mercurial 2002 2005 git ArX BitKeeper 2005 2003 Darcs Bazaar 1999 SVK 2002 2005 2003 Subversion 2000 TFS AccuRev 2005 2002

  7. git 2005 BitKeeper 2015 mercurial 1999 2005 and beyond Subversion Perforce 2000 2000

  8. Centralized Distributed SCM db SCM db vs Workspace Workspace SCM db SCM db Workspace SCM db SCM db Workspace Workspace Workspace Workspace Workspace Workspace SCM db Workspace SCM db Workspace Workspace

  9. Centralized SCM Advantages Disadvantages • Serializes what is really parallel work. • Partial checkouts. • Merge then commit model. Means • Binary handling. SCM db you can’t test changes in isolation. • Single place to backup / you know • No local sandboxes. Mixes where your source is. ‘committing’ and ‘publishing’ code. Workspace Workspace Workspace Workspace Workspace • Security: you can set up • Branches are heavyweight . permissions on the server. • Limited workflow. • File Locking.

  10. Distributed SCM Advantages Disadvantages • Workspaces take up more • Commit then merge. space since they include the • Separates commit from full history. SCM db publishing. Gives you a local Workspace sandbox. • Binary files can be a problem. • Implicit backup. • No partial checkouts. SCM db SCM db SCM db SCM db SCM db SCM db • More flexible workflows. Workspace Workspace Workspace Workspace Workspace Workspace • Hard to control access. • Branches are lightweight .

  11. Why did DVCS overtake centralized systems?

  12. What role does the SCM have?

  13. SCM as Backup • Check files in. • Check files out. • Occassionally revert to a previous version.

  14. SCM as Detective • When was this bug introduced? • Bisect • History exploration tools. • Who deleted this? • Why is this code this way?

  15. SCM as Data • Historically, how long does it take us to develop a feature? • How long to fix a bug? • Which areas of the code are unmaintained? Obsolete? Can be removed?

  16. SCM as Post Mortem • What caused us to ship this bug? • What could we have done to prevent it?

  17. It’s about Workflow

  18. Centralized Workflow with DVCS official bits SCM db Workspace SCM db SCM db SCM db SCM db SCM db SCM db SCM db Workspace Workspace Workspace Workspace Workspace Workspace Workspace

  19. Workflow with DVCS official bits SCM db Workspace SCM db Workspace SCM db SCM db SCM db Workspace Workspace Workspace SCM db SCM db SCM db Workspace Workspace Workspace

  20. Workflow with DVCS official bits SCM db Workspace SCM db Workspace SCM db SCM db SCM db Workspace Workspace Workspace SCM db SCM db Workspace Workspace SCM db SCM db Workspace Workspace

  21. Workflow with DVCS official bits SCM db Workspace test SCM db merge Workspace SCM db Workspace SCM db SCM db SCM db SCM db SCM db SCM db Workspace Workspace Workspace Workspace Workspace Workspace

  22. Every workspace is a branch

  23. Three Problems with DVCS Large Security Binary Files Source Bases

  24. Three Problems with DVCS Large Security Binary Files Source Bases

  25. Binaries Don’t Diff Well • Rolling checksums help “chunk”. • However, some file formats trickle changes. • Video formats. • Image formats. • Storing every copy bloats the history.

  26. Binary Files Solution: Make them act more like centralized systems! And store the contents in a server (or many). BitKeeper BAM Git LFS Mercurial LFE Replace binary files If someone wants an old in history copy, it’s fetched on demand. with pointers.

  27. Three Problems with DVCS Large Security Binary Files Source Bases

  28. Security in DVCS • With a monorepo 
 → All or nothing. • With multirepo (including nested) 
 → Access at a repository level. • Read vs Write Access 
 → Anyone can commit, don’t let them push!

  29. Three Problems with DVCS Large Security Binary Files Source Bases

  30. LARGE source bases

  31. Facebook (git) 9.500M 1.400K 1.050K 4M (bk) Number of Files 7.656M Android 700K (repo) 14.362M 350K FreeBSD Ports (SVN) 238M 1M (bk) 2.696M 0K FreeBSD Linux 1,25M 2,5M 3,75M 5M Src (git) (SVN) 599M Number of Commits 896M

  32. 1.000M Google 750M 86T Number of Files 500M 250M All Combined 0,08T 0M 10M 20M 30M 40M Number of Commits

  33. Monorepo vs Multirepo

  34. Some Disadvantages • A little too easy to share. • Access control. (E.g. Outsourcing.) • Noisy commit messages. • Cloning no longer an option.

  35. Not just LARGE 
 also COMPLICATED

  36. Library API

  37. What about multirepo?

  38. app.git macApp.git webapp.git restapi.git libglue.git Library server.git API droid.git WinApp.git

  39. ONE DOES NOT SIMPLY CHANGE A PUBLIC API

  40. Problems of Multirepo • Loss of atomicity. • Loss of the ability to use SCM tools. • That feeling of “ Never change anything ”. • Having multiple repositories breaks tools that interact with the SCM.

  41. Mono vs Multi? 
 How about a Hybrid? • Partial Checkouts. • Preserves Atomic Commits. • You can decouple and reuse components. Solution: Stitch together multiple repositories into one.

  42. Case Study: Git Submodules .gitmodules Repository /submodule/path/in/repo http://some_server/submodule Submodule e46fe3df01435bf523d2ab4f2755556c0e4e6f78

  43. Case Study: Git Submodules http://some_server/submodule Submodule clone Repository Repository Submodule Submodule

  44. Case Study: Git Submodules http://some_server/submodule Submodule Submodule clone clone Repository Repository Submodule

  45. Case Study: Git Submodules http://some_server/submodule Submodule push push Repository Repository Submodule Submodule

  46. Case Study: Git Submodules http://some_server/submodule Submodule sync Repository Repository Submodule Submodule

  47. Case Study: Git Submodules fatal: ¡reference ¡isn’t ¡a ¡tree: ¡6c…e0 ¡ Unable ¡to ¡checkout ¡'6c…e0' ¡in ¡submodule ¡path ¡'sub' Means Someone forgot to push the submodule ‘sub’.

  48. Case Study: Git Submodules submodule ¡$ ¡git ¡push ¡ Everything ¡up-­‑to-­‑date Means You made a commit in the submodule while it was in a detached head state (the default). You will cause the problem outlined in the previous slide.

  49. MY BRAIN HURTS

  50. Git Submodules are too loosely coupled with the main repo.

  51. Key Insight • We’ve seen this problem before: 
 CVS • We’ve solved this problem before: 
 ChangeSets bind changes to independent files together. • What if we treat repositories the same way we treat files?

  52. A component is to a product like a file is to a repository

  53. BitKeeper Nested Product Product SCM db SCM db Workspace Workspace Clone SCM db SCM db SCM db SCM db SCM db SCM db Workspace Workspace Workspace Workspace Workspace Workspace Components Components

  54. BitKeeper Nested Product Product SCM db SCM db Workspace Workspace Pull SCM db SCM db SCM db SCM db SCM db SCM db Workspace Workspace Workspace Workspace Workspace Workspace Components Components

  55. BitKeeper Nested Product Product SCM db SCM db Workspace Workspace Push SCM db SCM db SCM db SCM db SCM db SCM db Workspace Workspace Workspace Workspace Workspace Workspace Components Components

  56. BitKeeper Nested Product Product SCM db SCM db Workspace Workspace Clone SCM db SCM db SCM db SCM db SCM db SCM db Workspace Workspace Workspace Workspace Workspace Workspace Components Components

  57. BitKeeper Nested Product SCM db Workspace SCM db SCM db SCM db SCM db Workspace Workspace Workspace Workspace Detach Components

  58. BitKeeper Nested Product SCM db Workspace SCM db SCM db SCM db Workspace Workspace Workspace Port Components SCM db Workspace

  59. So? Hybrid Multirepo Monorepo • Goes better with distributed. • Goes better with distributed. • Goes better with centralized. • Project has conceptual • Takes atomic commits from • Project boundaries are not clear boundaries. monorepo. (files move around). • You can work with a small • Takes conceptual boundaries • Lots of reuse, origin doesn’t from multirepo. number of components. matter. • You can clone components but • Huge source base and need • Outsourcing, working with still work within overall structure. most of it. No natural boundaries. partners.

  60. Don’t let your tools determine your workflow

Recommend


More recommend