STAT 605 Data Science Computing Introduction to Version Control: git Some materials adapted from Pro Git by Scott Chacon and Ben Straub
Version control It is useful to record and track the changes to a project over time ● Revert to older versions (e.g., if we accidentally introduce a bug) ● Compare different implementations of a function ● Track who implemented what changes Want to do this locally (i.e., stored on our own machine, not in the cloud)... ...but in a distributed manner (i.e., multiple people working on project at once). For a more thorough discussion of why you should use version control, the problems that git seeks to solve, and how it solves them, see https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control
git : Distributed Version Control Created by Linus Torvalds (also the creator of Linux) Free and open-source, available at https://git-scm.com/ Installation Ubuntu: apt install git (you may need to use sudo ) Windows/MacOS: https://git-scm.com/downloads
Please see the lecture video for a demonstration of git installation and configuration.
git stores file snapshots over time As we make changes to a project, git keeps track of those changes Allows us to got back to an earlier version, if necessary Image credit: S. Chacon and B. Straub. Pro Git
Getting a git repository Option 1: create a git repository Take a directory on your machine Start tracking files in that directory Option 2: clone an existing repository from elsewhere Take an existing git repo (e.g., an R package that you like) Create a copy of it on your local machine This also allows you to contribute back to a project, if you wish to do so
Please see the lecture video for a demonstration of creating and cloning repositories in git .
Recording changes in your repository All files that git tracks are in one of three states at a given time Untracked Committed Modified Staged Image credit: S. Chacon and B. Straub. Pro Git
Three basic states: modified, staged, committed A file in your git repository can be in one of three states: Modified : file has been changed, but is not yet committed to the database Staged: a modified file that is ready to be included in the next snapshot Committed: the file is stored in the database (i.e., a snapshot has been taken). Image credit: S. Chacon and B. Straub. Pro Git
Three basic states: modified, staged, committed A file in your git repository can be in one of three states: Modified : file has been changed, but is not yet committed to the database Note: not every file in a project directory has to Staged: a modified file that is ready to be be part of the repo. Thus, there may be files in included in the next snapshot a directory that are in none of these states, because they are not being tracked at all. Committed: the file is stored in the database (i.e., a snapshot has been taken). Image credit: S. Chacon and B. Straub. Pro Git
Please see the lecture video for a demonstration of adding files to the git repo and tracking changes.
The basic workflow 1) Modify one or more files in your repository 2) Stage the changes that you wish to add to the next snapshot 3) Commit your changes. A snapshot of the staged files is stored.
Reminder: git stores file snapshots over time As we make changes to a project, git keeps track of those changes Allows us to got back to an earlier version, if necessary Image credit: S. Chacon and B. Straub. Pro Git
The structure of the git repository When you commit to the repo: Git stores an object with a pointer to the snapshot of the files you staged Object also includes additional information, e.g., commit author, message, etc Commit object stores a pointer to its parent(s) ( commit(s) that came directly before ) Add three files for staging, and commit.
The structure of the git repository When you commit to the repo: Git stores an object with a pointer to the snapshot of the files you staged Object also includes additional information, e.g., commit author, message, etc Commit object stores a pointer to its parent(s) ( commit(s) that came directly before ) Commit object created by git commit
The structure of the git repository When you commit to the repo: Git stores an object with a pointer to the snapshot of the files you staged Object also includes additional information, e.g., commit author, message, etc Commit object stores a pointer to its parent(s) ( commit(s) that came directly before ) Add three files for staging, and commit. Now, git has created a commit object , which includes Roughly speaking, tree objects a pointer to the root tree object of the project. correspond to UNIX/Linux directories, A blob object is created for each newly committed file. while blob objects correspond to files.
Commit object created Tree object created by git init by git commit and updated by commits Blob objects corresponding to the three files, created by git add Image credit: S. Chacon and B. Straub. Pro Git
Please see the lecture video for a demonstration of examining the git commit history
Managing multiple versions: Branching in git When you commit to the repo: Git stores an object with a pointer to the snapshot of the files you staged Object also includes additional information, e.g., commit author, message, etc Commit object stores a pointer to its parent(s) ( commit(s) that came directly before ) Add three files for staging, and commit.
As we make additional changes and commit them, each commit points back to the commit immediate before it. A branch is simply a pointer to one of these commit objects. Image credit: S. Chacon and B. Straub. Pro Git
Two different branches, both pointing to the same commit. The head points to the current branch. That is, the branch that we are currently working on. Image credit: S. Chacon and B. Straub. Pro Git
Two different branches, both pointing to the same commit. The head points to the current branch. That is, the branch that we are currently working on. Note: the master branch is not special; it is just the default name for the first branch created by init. Image credit: S. Chacon and B. Straub. Pro Git
Creating a new branch Head points to current (only) branch. The current branch, master , created on initialization of the repository. Image credit: S. Chacon and B. Straub. Pro Git
Creating a new branch Create a new branch called testing , pointing to the current commit. Head still points to current branch. The current branch, master , created on initialization of the repository. New branch created by git branch . Image credit: S. Chacon and B. Straub. Pro Git
Please see the lecture video for a demonstration of creating a new branch with git branch and viewing the branch pointers using git log
Branches: the basic workflow Here is a project with three commits, and a single branch. Image credit: S. Chacon and B. Straub. Pro Git
Branches: the basic workflow Here is a project with three commits, and a single branch. Create a new branch called iss53 , and switch HEAD to that branch. Image credit: S. Chacon and B. Straub. Pro Git
Branches: the basic workflow Here is a project with three commits, and a single branch. Create a new branch called iss53 , and switch HEAD to that branch. Now we have a new branch, iss53 , pointed to by HEAD (not shown). Any commits we make will be made to iss53 , rather than master . Image credit: S. Chacon and B. Straub. Pro Git
Branches: the basic workflow If we make changes and commit them, the current branch moves forward, while master remains unchanged. Image credit: S. Chacon and B. Straub. Pro Git
Please see the lecture video for a demonstration of switching between branches with git checkout and viewing the commit history of multiple branches.
If we make changes in both of our branches, then they will have divergent histories . The changes in the two branches are isolated from one another. Eventually, we may want to merge them. Image credit: S. Chacon and B. Straub. Pro Git
Merging branches in git Merge the changes made in branch iss53 i nto branch master . Image credit: S. Chacon and B. Straub. Pro Git
Merging branches in git New commit created by merge operation. After a merge like this, we can typically delete the branch that we merged: git branch -d iss53 Image credit: S. Chacon and B. Straub. Pro Git
Please see the lecture video for a demonstration of merging branches with git merge
Merge conflicts What if we make changes to the same part of the same file in two branches? git may not know how to merge them, and we’ll get an error like... Note: You can use git status to get more information about what went wrong. Files with merge conflicts will have sections that look like this. Contents of index.html in HEAD branch Contents of index.html in branch being merged
Merge conflicts What if we make changes to the same part of the same file in two branches? git may not know how to merge them, and we’ll get an error like... Note: You can use git status to get more information about what went wrong. Files with merge conflicts will have sections that look like this. Contents of index.html in HEAD branch We have to fix these sections before we can merge! Contents of index.html in branch being merged
Please see the lecture video for a demonstration of fixing merge conflicts.
Recommend
More recommend