Version Control with Git
Questions:
- What is version control and why is it useful?
- How can I use Git to version control my files?
- What is the difference between Git and Github?
Objectives:
- Describe the benefits of version control
- Initialise a Git repository from the command line or through the Github interface
- Use command line git to version control your work locally
- Use git and Github to store and share a remote copy of your work
Keypoints:
- Version control is the ultimate undo button for code
- Github is very widely used in academia and industry
- Use
git init
to initialise an empty git repository - Use
git status
for a summary of your repository - Git uses a two-step process for version control:
git add
andgit commit
- See the changes made to a file using
git diff
- Use
git log
to see a record of the commits that have been made - You can
add
andcommit
multiple files - Use
git push
andgit pull
to communicate with a remote repository - For some tasks the Github web interface is a useful alternative
Github is very widely used in academia and industry
In this lesson we will be using a version control system called git. You can install and use git locally on your own computer (without any internet connection). However there are several online services that will store remote copies of your git repositories. Remote copies are highly encouraged for two reasons: - as a backup in case your computer dies - to share your work with other people
In this lesson we will be using this most popular git-based tool - Github. This is also where all of the code for this website is stored. We will see later in the course that Github can also be used to host website and automate tasks.
Use git init
to initialise an empty git repository
First, let’s create a folder to hold the files we want to version control:
mkdir my_project
cd my_project
ls
Second, create a file. You can use the in-built terminal editor vim or nano, or any plain-text editor (such as Notepad). Whichever editor you use, you need to make sure you save the file in the my_scripts
folder.
vim hello.py
Note: To start writing in vim type
i
def hello_world():
print("hello_world")
if __name__ == "__main__":
hello_world()
Note: We are not writing the Python shebang as we treating this like a Python module (with functions/code you can import) rather than a Python script that is ran from top-to-bottom
Note: To save and exit vim you type
Esc
,:wq
,Enter
.
Third, create a git repository to version control this new file:
git init
The git repository is a hidden file so to see it we need to use the command:
ls -a
. .. .git hello.py
Git stores all of the repository data in the .git directory. To delete the repository you delete this hidden folder
rm -rf .git
Caution: Always take care using the command
rm -rf
. This permenantly deletes a directory and everything within the directory - and it will not be available in the recylcing bin!
Use git status
for a summary of your repository
git status
is a very useful command. It summarises the status of your git repository.
git status
On branch master
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
hello.py
nothing added to commit but untracked files present (use "git add" to track)
Note: For a long time the default branch in most Git repositories was named
master
. Fortunately, many people have become aware that this terminology should be replaced to something more inclusive:main
. If you branch is calledmaster
you can rename it usinggit branch -m master main
.
The git outputs are generally quite helpful. Here we are told that there is “nothing added to commit but untracked files present” and git suggests that we use “git add” to track. What is this all about?
Git uses a two-step process for version control: git add
and git commit
If you think of Git as taking snapshots of changes over the life of a project, git add
specifies what will go in a snapshot (putting things in the staging area), and git commit
then actually takes the snapshot, and makes a permanent record of it (as a commit).
First let’s add our new file:
git add hello.py
git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: hello.py
Second let’s commit our file to the repository history. We provide a short commit message, describing what the changes are and/or why they were made:
git commit -m "function to demonstrate the if __name__ == __main__ syntax"
[main (root-commit) 0f7a7b4] create a hello_world function
1 file changed, 5 insertions(+)
create mode 100644 hello.py
Info: Good commit messages often describe why a change was made. For example ‘fixed a bug that was breaking the unit tests’. Information about what changed can be gotten by asking git to compare different versions of the file (we’ll see this later in the lesson).
The two stage process is useful because it means you can carefully craft your commit snapshots. For example, I may make several changes to several files. I then want to version control the changes. Instead of being forced to use a single commit for changes that are unrelated I can split my changes into several smaller commits - for example, one for “implementing a new algorithm to find the minima” and one for “improved function docstrings”.
See the changes made to a file using git diff
Now let’s make an edit to the file
vim hello.py
def hello_world():
"function to greet the world"
print("hello_world")
if __name__ == "__main__":
hello_world()
We can see the difference between the latest version of the file and the version of the file stored in the git repository using git diff
:
git diff
diff --git a/hello.py b/hello.py
index cd6a6ec..df4dfe3 100644
--- a/hello.py
+++ b/hello.py
@@ -1,4 +1,5 @@
def hello_world():
+ "function to greet the world"
print("helloooo world")
if __name__ == "__main__":
(END)
Commit this change to the repository:
git add
git commit -m "include docstring as per project guidelines"
git status
On branch main
nothing to commit, working tree clean
Use git log to see a record of the commits that have been made
git log
commit b4bfc23897e6dd3c8faed6f101b5438ff0cc98c1 (HEAD -> main)
Author: Lucy Whalley <l.whalley@northumbria.ac.uk>
Date: Mon Nov 22 20:40:14 2021 +0000
include docstring as per project guidelines
commit 0f7a7b4e03439cd9f854dec2f438a85ffbd31fd9
Author: Lucy Whalley <l.whalley@northumbria.ac.uk>
Date: Mon Nov 22 20:24:32 2021 +0000
function to demonstrate the if __name__ == __main__ syntax
You can add
and commit
multiple files
Create a new python module that we will import into out “hello.py” module:
vim bonjour.py
def bonjour_le_monde():
print("bonjour le monde!")
vim hello.py
import bonjour
def hello_world():
"function to greet the world"
print("hello_world")
if __name__ == "__main__":
hello_world() bonjour.bonjour_le_monde()
python hello.py
hello_world
bonjour le monde!
git status
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: hello.py
Untracked files:
(use "git add <file>..." to include in what will be committed)
__pycache__/
bonjour.py
no changes added to commit (use "git add" and/or "git commit -a")
We can add and commit both files at the same time
git add hello.py bonjour.py
git commit -m "implement and import french translation"
[main 8fb9bff] implement and import french translation
2 files changed, 6 insertions(+), 1 deletion(-)
create mode 100644 bonjour.py
Use git push
and git pull
to communicate with a remote repository
Currently all of our files and changes are stored locally on our computer.
In practice, most programmers hold up-to-date copies of their files on a remote service such as Github. To create a remote repository on the Github servers follow these four steps:
- Log into Github
- Click on the “+” icon in the top right hand corner to create a new repository
- Provide a name and description
- Select “Add a README file” and “Choose a license”
Info: To decide which open source license you would like to use visit https://choosealicense.com/.
You now need to push
your local repository to the remote server. To do so, follow the commands under “…or push an existing repository from the command line” into your terminal (you need to be in the my_project
folder when you do this).
git remote add origin https://github.com/lucydot/my_project.git
git push -u origin main
If you make changes to the files on the remote Github repository, you can pull
these changes to your local repository with
git pull
It is important to git push
and git pull
frequently so that your local and remote repositories stay up-to-date with one another.
For some tasks the Github web interface is a useful alternative
We have already seen how to create a repository on Github. You can also use the Github web interface (“drag and drop”) to add and commit files.
TASKS
Use Github to:
- Create a repository for holding the work done in this module
- Create a README.md and select an open source license
- Use the Github drag-and-drop interface to upload the script(s) you wrote in the previous lesson
Extension:
- Use the git command line to version control and upload the Jupyter Notebooks (or any other file) generated during this course