1
play

1 For the remainder of the class today, I want to introduce you to a - PDF document

1 For the remainder of the class today, I want to introduce you to a topic we will spend one or two more classes discussing and that is source code control or version control. What is version control? (discuss) Who has used version control?


  1. 1

  2. For the remainder of the class today, I want to introduce you to a topic we will spend one or two more classes discussing and that is source code control or version control. What is version control? (discuss) Who has used version control? Favorite VCS? Uses of version control (read) 2

  3. There are several different types of VC. Earliest systems used local version control. Utilities such as diff and patch can be used to implement a form of version control. (tell story about Mom’s tar, diff, patch system) rcs is a popular local version control system still in use today. Might be useful in a system with no network. 3

  4. The next iteration of version control was to store different versions on a centralized server system that was connected to each developer. This allowed developers working on different systems to collaborate on the same project. Basically, a single server contained all the versions of each file and the client systems would check files in and out of this central location. There are risks to doing it this way. The most obvious is the single point of failure that the centralized server represents. If that server goes down for an hour, then during that hour nobody can collaborate at all or save versioned changes to anything they’re working on. If the hard disk the central database is on becomes corrupted, and proper backups haven’t been kept, you lose absolutely everything – the entire history of the project except whatever single snapshots people happen to have on their local machines. Local VCS systems suffer from this same problem – whenever you have the entire history of the project in a single place, you risk losing everything 4

  5. So, the solution that was proposed and developed was to use a distributed VCS system – where every client system keeps a full mirror of the entire repository. Now, obviously the downside is that you have extra data stored on each client. But, every clone of the repo now has a full back-up of the data. So, if any server dies, and these systems were collaborating via the server, any of the client repositories can be copied back up to the server to restore it. 5

  6. We’re going to study git in this course. The major difference between Git and any other VCS (Subversion and friends included) is the way Git thinks about its data Conceptually, most other systems store information as a list of file-based changes Draw: V1 V2 V3 V4 V5 FA  D1  D2 FB  D1  D2 FC  D1  D2  D3 These systems (CVS, Subversion, Perforce, Bazaar, and so on) think of the information they keep as a set of files and the changes made to each file over time 6

  7. Git thinks of its data more like a set of snapshots of a miniature filesystem. Every time you commit, or save the state of your project in Git, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored. Git thinks about its data more like a stream of snapshots 6

  8. Most operations in Git only need local files and resources to operate – generally no information is needed from another computer on your network. For example, to browse the history of the project, Git doesn’t need to go out to the server to get the history and display it for you – it simply reads it directly from your local database. This means you see the project history almost instantly. If you want to see the changes introduced between the current version of a file and the file a month ago, Git can look up the file a month ago and do a local difference calculation, instead of having to either ask a remote server to do it or pull an older version of the file from the remote server to do it locally. Working offline is nice. Git also has some nice integrity guarantees. Everything in Git is check-summed before it is stored and is then referred to by that checksum. This means it’s impossible to change the contents of any file or directory without Git knowing about it. The mechanism that Git uses for this checksumming is called a SHA-1 hash. This is a 40-character string composed of hexadecimal characters (0 – 9 and a – f) and calculated 7

  9. based on the contents of a file or directory structure in Git When you do actions in Git, nearly all of them only add data to the Git database. It is hard to get the system to do anything that is not undoable or to make it erase data in any way. This makes using Git a joy because we know we can experiment without the danger of severely screwing things up. 7

  10. This is the main thing to remember about Git if you want the rest of your learning process to go smoothly. Git has three main states that your files can reside in: committed, modified, and staged. Committed means that the data is safely stored in your local database. Modified means that you have changed the file but have not committed it to your database yet. Staged means that you have marked a modified file in its current version to go into your next commit snapshot 8

  11. There are three main sections of a git project The Git directory is where Git stores the metadata and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer. The working directory is a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk for you to use or modify. The staging area is a file, generally contained in your Git directory, that stores information about what will go into your next commit. It’s sometimes referred to as the “index”, but it’s also common to refer to it as the staging area. The basic Git workflow goes something like this: 1. You modify files in your working directory. 2. You stage the files, adding snapshots of them to your staging area. 3. You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory. 9

  12. If a particular version of a file is in the Git directory, it’s considered committed. If it has been modified and was added to the staging area, it is staged. And if it was changed since it was checked out but has not been staged, it is modified. 9

  13. You now have a bona fide git repository and a checkout or working copy of the files for that project. You need to make some changes and commit snapshots of those changes into your repository each time the project reaches a state you want to record. Each file in your working directory can be in one of two states: tracked or untracked. Tracked files are files that were in the last snapshot; they can be unmodified, modified, or staged Untracked files are everything else – any files in your working directory that were not in your last snapshot and are not in your staging area. When you first clone a repository, all of your files will be tracked and unmodified because you just checked them out and haven’t edited anything 10

  14. 11

  15. The main tool you use to determine which files are in which state is the git status command. 12

  16. 13

  17. 14

  18. 15

  19. 16

  20. 17

  21. Now, I want to talk about branching in git. Branching just means you diverge from the main line of development and continue to do work without messing with that main line. It's an important feature of version control systems because it allows you to implement new and experimental features without having your untested code in the mainline source tree. Also, branching in git is sometimes referred to as it's killer feature. The reason is that, in many VCS's, branching requires you to copy the entire source tree. Git's branching mechanism is very lightweight, and encourages workflows that branch and merge often. 18

  22. To understand branching in git, we first need to understand how git actually stores the content you've committed. When you make a commit in git, the system stores a commit object that contains a pointer to the snapshot of the content you staged. This object also contains the author’s name and email, the message that you typed, and pointers to the commit or commits that directly came before this commit (its parent or parents): zero parents for the initial commit, one parent for a normal commit, and multiple parents for a commit that results from a merge of two or more branches. 19

  23. To visualize this, let’s assume that you have a directory containing three files, and you stage them all and commit. Staging the files checksums each one stores that version of the file in the git repository (git refers to them as blobs), and adds that checksum to the staging area When you create the commit by running git commit, git checksums each subdirectory (in this case, just the root project directory) and stores those tree objects in the git repository. git then creates a commit object that has the metadata and a pointer to the root project tree so it can re-create that snapshot when needed 20

  24. If you make some changes and commit again, the next commit stores a pointer to the commit that came immediately before it. 21

  25. A branch in Git is simply a lightweight movable pointer to one of these commits. The default branch name in git is master. As you start making commits, you’re given a master branch that points to the last commit you made. Every time you commit, it moves forward automatically. 22

Recommend


More recommend