quick start on Git and GitHub
This post is a summary note based on what I learned on DATA 550: Data Science Toolkit. I want to recap what I’ve learned but also refine and expand on it, turning it into a clear actionable workflow that I can rely on for future projects. It’s about creating a practical guide that I—and hopefully others—can follow with ease.
Git is a version control system that helps track changes in your code and collaborate with others. To start, you need to install it on your system.
sudo apt install git
brew install git
To confirm Git is installed, run:
git --version
After installing Git, you need to set up your user identity so that Git can properly attribute changes to you
git config --global user.name "Your Name"
git config --global user.email "your-email@example.com"
A Git repository (repo) is where all version history for a project is stored.
Navigate to the project folder (on your local computer) and initialize a Git repository:
git init
This creates a hidden .git
folder that Git uses to track changes.
To start tracking files, add them to Git and make an initial commit:
git add .
git commit -m "Initial commit"
The add
command tells git which file you want to track on. git add .
will track all the files in the current directory. The commit
command then takes a snapshot on the added files, to track the current changes.
git add filename.txt # Add specific file
git add . # Add all changes
git commit -m "Commit message"
Check which files have been modified:
git status
To see previous commits:
git log --oneline --graph --decorate --all
GitHub is a platform that lets you store Git repositories online and collaborate with others.
We will need SSH keys to build secure connections between our local computer and GitHub. The following command generates the key pair.
ssh-keygen -o -t rsa
After generating an SSH key, you need to add it to GitHub to connect your local machine with GitHub.
Here’s how:
Copy Your SSH Key Run the following command to copy your public key:
cat ~/.ssh/id_rsa.pub
This will display the key. Copy the entire output.
Add the Key to GitHub
Verify the Connection Run the following command to check if GitHub recognizes your key:
ssh -T git@github.com
If successful, you should see a message like:
Hi your-username! You've successfully authenticated, but GitHub does not provide shell access.
You may see messages like “The authenticity of host ‘github.com’ can’t be established” while trying to connect to GitHub via SSH, it means that your SSH client (such as OpenSSH) doesn’t recognize the host key for GitHub’s server.
This message typically appears the first time you attempt to connect to a remote server using SSH. To proceed, you typically have the option to accept the authenticity of the host by typing “yes” when prompted.
Create a repository on GitHub. It’s like your project’s home base. This is where all your Git history, changes, and progress will live, making it easy to keep track of everything as you work. Think of it as your project’s storybook, documenting every step along the way.
Connect your local repository to the one you just created on GitHub:
git remote add origin https://github.com/your-username/repo-name.git
This ties your local project to the remote repository, so you can easily push your changes and keep everything in sync. Think of it as building a bridge between your computer and GitHub.
git branch main # Rename to main if needed
git push origin main
If you are working with a team, you need to pull the latest changes before making new edits:
git pull origin main
After making changes locally, push them to the remote repository:
git push origin main
Branches allow you to work on new features without affecting the main codebase.
git branch new-feature # Create a new branch
git checkout new-feature # Switch to it
Once a feature is complete, merge it back into the main branch:
git checkout main
git merge new-feature
After merging, you can delete the feature branch:
git branch -d new-feature
Revert the last commit but keep the changes staged:
git reset --soft HEAD~1
Revert the last commit and remove all changes:
git reset --hard HEAD~1
Revert a specific file to the last committed version:
git checkout -- filename.txt
To stop tracking a file that was previously added to Git, without deleting it from your local disk.
git rm --cached filename.txt
Copy an existing repository from GitHub to your local machine:
git clone https://github.com/username/repo-name.git
Forking allows you to make changes to someone else’s repository without affecting the original.
Click Fork on the repository page.
Clone your fork:
git clone https://github.com/your-username/forked-repo.git
A pull request (PR) lets others review and merge your changes.
Push changes:
git push origin branch-name
On GitHub, go to Pull Requests and click New Pull Request.
Once approved, you can merge the PR into the main branch.
Keep your fork updated with changes from the original repository:
git remote add upstream https://github.com/original-repo.git
git fetch upstream
git merge upstream/main
Specify files and folders that should not be tracked by Git. Create a .gitignore
file and add:
output/*.rds
output/*.png
Git LFS (Large File Storage) helps manage large files efficiently:
git lfs track "*.dta"