Intro to Git for Version Control#
Overview#
Version control is a powerful way to organize, back up, and share with collaborators your research computing code. A Verson control system keeps track of a set of files and saves snapshots (i.e. versions, commits) of the files at any point in time. Using version control allows you to confidently make changes to your code (any any other files), with the ability to roll back to any previous state. This help avoid filling our directories up with files that look like this:
my_code.py
my_code_version2.py
my_code_version2B-DMW-edit.py
my_code_FINAL_VERSION.py
my_code_THIS_IS_ACTUALLY_THE FINAL VERSION.py
Version control also allows you to share code with collaborators, make simultaneous edits, and merge your changes in a systematic, controlled way.
Version control has been used for a long time in software development. More recently, it has become an essential part of modern data and computational science. Our strong recommendation is that all of your research code be stored in a version control system.
The tool we will be using for version control is called Git. Git is incredibly powerful–it also has a somewhat steep learning curve. Fortunately, in this class, we will only be using a small subset of what git can do, avoiding the more complex aspects.
For a full-length tutorial, we recommend the article Version Control with Git on Software Carpentry website. Here we simply enumerate the most common git commands.
Summary of useful Git commands#
Set up your username and email
git config --global user.name "Dan Westervelt"
git config --global user.email "danielmw@ldeo.columbia.edu"
Create a new repository:
cd my_project
git init
Stage files for addition to the repository:
git add <filenames>
Commit staged files:
git commit -m "your brief commit message goes here"
Get information about your repository:
git status # tells you what files are staged, which ones have been modified, are new,... )
git log # view the commit log
git diff # view file content differences
Revert a file to an earlier version:
git checkout <commit tag> <filenames>
Using Git / GitHub from remote JupyterHub#
The recommended way to move code in and out of a remote hub is via git / GitHub. You should clone your project repo from the terminal and use git pull / git push to update and push changes. In order to push data to GitHub from the hub, you will need to set up GitHub authentication. To do this, we will need to generate an SSH key and connect it with github. Within your home directory tracked by git on Chopin, run:
ssh-keygen -t rsa -b 4096 -C dmw2166@Columbia.edu
but replace the email address with yours associated with git.
We then need to give Github our public key under Settings and SSH and GPG keys on github.com
You should do your github work on the course Jupyterhub on Chopin.
Collaborating with Git and Github#
Create a new repository on GitHub
Follow the instructions and run
git remote add origin <repo url>on your local repoWarning
In order to authenticate with your SSH key from the previous section, you need use SSH-style GitHub urls. When setting up a new repo, underneath where it says “Quick setup — if you’ve done this kind of thing before”, make sure you click the SSH box. The repo URL should look something like
git@github.com:rabernat/planets.git(NOThttps://github.com/rabernat/planets.git).If you already added a remote with
http://, you can remove it by typinggit remote rm origin
(assuming the name of your remote is indeed
origin.)make your changes and stage them with
git add,commit your changes with
git commit -m, andupload the changes to GitHub with
git push origin mainupdate your local repo with
git pull origin main
All of the above commands are available from our course’s cloud-based JupyterHub. This is an excellent way to move code in and out of your cloud-based environment (while simultaneously backing it up.)