About This Guide

This guide is an introduction to Git and GitHub, which are popular version control and collaboration tools. We will build confidence in using these tools, understand how they work and get the best out of them. No prior experience is required.

All tools covered are free to install and use!

1. What and why?

Git is version control software that lets you create snapshots of your project at specific moments. Like track changes in Word or saving progress in a video game, it lets you choose when and what to save from your work.

When you modify files, add new ones, or delete unwanted content, Git captures the entire project state. You can return to any snapshot, see exactly what changed between versions, and never lose work again. It’s infinite “undo” for your entire project, plus the ability to explore different approaches simultaneously.

While Git is the tool that works locally on your computer, GitHub is a website where you can share and backup your Git projects.

*Overview of the Git/GitHub pipeline - we’ll cover each step in more detail!*

Why should researchers care?

If you’ve ever accidentally deleted important files, struggled to share analysis with collaborators, or created one too many “final_FINAL_v2” versions, Git is the answer. It may seem intimidating initially, but it’s designed to make research safer and more collaborative:

Professional impact:

Open research receives more citations than closed research
GitHub profiles showcase technical skills to collaborators and employers
Funding agencies increasingly require open access to research code

Easy collaboration:

Multiple researchers work on projects without email attachments
Colleagues can build on your work easily, creating more collaboration opportunities
Contribute to others’ projects and expand your network

Industry standard:

Used by 90%+ of software developers and growing numbers of researchers
Essential for reproducible research and open science
Fluency in modern collaborative workflows

Peace of mind:

Permanent protection against file loss or accidental deletion
Complete change tracking with timestamps and explanations
Automatic backup and synchronisation across devices

Common fears (and why they’re wrong)

“It’s too complicated” -> We’ll start with the simplest possible workflow and build gradually.
“I’m not technical enough” -> Many successful researchers use Git with just 5 basic commands. You don’t need to learn everything.
“I’ll break something important” -> Git is designed to prevent data loss. Every change is tracked and reversible.
“I don’t have time to learn this” -> The time invested now will save dozens of hours later
“My field doesn’t use Git” > Perfect! Be the one who brings modern tools to your discipline.discipline

What We’ll Cover

Personal version control: Track your work safely, maintain clean project history, never lose progress

Professional sharing: Make research discoverable and citable with proper documentation on GitHub

Basic collaboration: Download and use others’ repositories, understand collaborative workflow foundations

At the end of the guide there will be links to further learning resources

2. Initial setup

This section covers:

Creating a GitHub account
Choose between command line and GUI tools
How to configure Git
The basic Git workflow

Before we start: You’ll need a computer with internet access and about 10 minutes for account creation and software installation.

2.1 Creating your GitHub account

GitHub is where you’ll store and share your work. We’ll set up an account now and come back to actually using GitHub a little later on.

Visit github.com and click “Sign up”
Choose a username

- Good: john-smith, john-smith-projects, research-john-smith

- Avoid: coolcat58, slowsloth4, codechampion

Your GitHub username becomes part of URLs (github.com/your-username), so choose something you’ll be comfortable sharing professionally.
Enter your email (use an institutional email if possible, some universities provide extra features)
Choose the free plan (it includes everything we need!)
Verify your email address

2.2 Choosing Your Tools

You can use Git however you feel comfortable and don’t need to stick to just one approach:

GitHub Desktop (Recommended for beginners): Point-and-click interface with a visual view of functions and colour-coded. No command line needed.

Command Line: Five essential commands handle 90% of workflows. Works everywhere and preferred by many experienced users.

Tip

Git also integrates with most IDE’s such as VScode where editing and version control are combined in one place!

This guide will cover both GitHub Desktop and the Command Line Interface (CLI) but feel free to use your preferred tool - Git behaves the same.

2.3 Setting Up Git

Before Git can track your work, it needs to know who you are. This allows you to make changes on different devices and attributes your work to you.

Download GitHub Desktop from desktop.github.com
Install and open the application
Sign in to GitHub.com using your credentials for GitHub.com

The app handles most configuration automatically!

*GitHub Desktop - follow the steps to sign in*

First, check if Git is already installed:

$ git --version
git version 2.39.0  # (or similar - any version 2.x+ is fine)

If Git isn’t installed or if you want the most up to date version:

- Mac: Install Command Line Tools with latest Git: xcode-select --install

- Windows: Download from git-scm.com

- Linux: sudo apt install git (Ubuntu) or equivalent for your distribution

Configure your identity:

$ git config --global user.name "Your name"

$ git config --global user.email "your.email@address"

# Verify the configuration
$ git config --list
user.name= Your name
user.email= your.email@address

If you’re new to the command line, here are some useful commands for navigating your folder structure:

$ pwd # "Print working directory" - where am I? This will show the path to the current folder

$ ls # List the files/folders in the current folder

$ cd folder_name # Change directory (moves into specified folder)

$ cd .. # Move up one level (to parent folder)

$ cd ~ # Go to the home folder - this is normally the first folder on all paths

It may feel unfamiliar using these commands at first but you’ll get used to them in no time and they are super useful for quick navigation!

What we’ve covered: You’ve created a GitHub account, configured Git with your identity, and chosen your preferred tools. Git now knows who you are and you can begin tracking your files.

If something went wrong:

“Git command not found”: Git isn’t installed. Follow the installation steps for your operating system.
“Can’t sign into GitHub Desktop”: Check your internet connection and GitHub credentials.
“Unexpected terminal output”: Check the commands are exactly correct (try copy-pasting them in).

2.4 The Basic Git Workflow

Now that Git is installed This four step pattern is all that’s needed to go from making local changes to sharing the progress online:

Check which files have changed
Choose what to save
Save a snapshot
Share your progress

This becomes as natural as saving a document - but much more powerful.

Demo project folder

In this guide, we will follow an example research project:
|── demo_CLI
│   ├── data.csv (some input data to share)
│   ├── main.py (our main analysis code to share)
│   ├── notes.txt (random notes to not share)
│   ├── outputs.csv (some output data to not share)
│   └── result.png (results figure to share)

3. Your First Repository

What you’ll learn:

How to create a Git repository
Understand what Git sees in your project folder

Before we start: Either use an existing project you’d like to start tracking (any folder with some files also works) or create a new folder but remember to add some files for later steps.

Creating Your Repository

If starting from a new project folder:

Open GitHub Desktop
Click “Create a New Repository on your Local Drive”
Fill in the details:
- Name: folder_name
- Description: Enter or leave blank - can edit this later
- Local path: Choose where you want the project folder
- Initialise with README: (can leave this checked or can create this later )
- Git ignore: None (we’ll create this later)
- License: None (we’ll add this later)
Click “Create Repository”

GitHub Desktop creates the folder and sets up Git automatically!

If you have an existing project folder to start tracking:

Launch a terminal window,

$ cd demo_desktop # Go to folder
$ git init        # Initialises the Git repository
Initialized empty Git repository in user/demo_desktop/.git/

Now in GitHub Desktop,

Click “Add an Existing Repository from your Local Drive…”
Select the folder where git is initialised

*GitHub Desktop starting repository (a folder git is tracking) selection*

In a terminal, navigate to your project folder (or create a new one):

$ cd demo_desktop # Go to folder

$ git init        # Initialises the Git repository
Initialized empty Git repository in user/demo_desktop/.git/

$ ls -a           # Example project files
.       .git        main.py     outputs.csv
..      data.csv    notes.txt   result.png

By initialising Git, a new hidden .git directory is created where all of the Git tracking and functionality is stored. Deleting this .git directory will remove all git functionality from the folder.

Understanding Git’s View of Your Files

Look at the “Changes” tab - you’ll see a list of all modified files - this will be empty if created a new repository
Each file shows its status:
- Green “+” means new file
- Yellow “M” means modified file
- Red “-” means deleted file
Notice the file count - “X changed files” tells you how much has changed

*This repository was just initialised so all these files are new*

At this point, Git sees your files and GitHub Desktop is set up to start tracking everything automatically. We can also view the contents of files and any changes.

Running git status shows us what Git is seeing in the folder.

$ git status
On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        analysis.py       
        data.csv          
        notes.txt          
        results.png
        outputs.csv      
        .DS_Store          

nothing added to commit but untracked files present (use "git add" to track)

Understanding this output:

On branch main -> The default branch (can be a diffent name like ‘master’ depending on the Git configuration)
No commits yet -> No snapshots have been created
Untracked files -> Git sees these files but isn’t managing them yet, the hint tells us there are untracked files present

What just happened? You’ve created a Git repository! Git is now watching your project folder and can see all your files. You can see the status of each file and whether Git is actively tracking that file.

Success check:

GitHub Desktop users: You see your project with files listed under “Changes”
Command line users: git status runs without errors and shows your files as “untracked”
Your project folder now contains a hidden .git directory (you may need to show hidden files with ls -a to see it)

If something went wrong:

“Not a git repository”: Make sure you’re in the right folder and ran git init
“No files showing in GitHub Desktop”: Check that you’ve selected the correct repository and there are files in the folder
“Permission denied”: Make sure you have write access to the folder you’re trying to use

4. Staging Files: Choosing what to track and save

Understanding the Staging Area

Git doesn’t automatically save every change you make, instead, it gives you control over exactly what gets saved and when. This happens through Git’s “staging area”, which acts like a preparation zone where you tell Git which file changes to track for a snapshot.

Think of it like packing for a trip. You don’t throw everything from your closet into a suitcase - you carefully select what you need for this journey. Similarly, staging lets you choose which file changes to include in each commit.

Why does this matter?

Imagine you’re working on your analysis and you’ve:

Fixed a bug in your main script
Updated your research notes
Created some temporary test files
Modified your data processing code

Rather than saving all these changes as one messy snapshot, staging lets you create two focused commits: one for the bug fix and data processing (the code changes), and another for the updated notes. The temporary files stay unstaged and don’t clutter your project history.

Staging Files in Practice

GitHub Desktop makes staging visual and intuitive. When you open the app after making changes to your files, you’ll see them listed in the “Changes” tab.

*GitHub Desktop staging: You can also see the exact modifications in a file*

In this example, you can see:

data.csv has been modified (yellow “M” icon)
main.py is a new file, not yet been committed (green “+” icon)
The other files are unchecked as they are ouptut data, notes, images all of which we don’t want to commit

To control what gets staged:

Keep checked - files you want in your next snapshot
Uncheck - files you want to exclude (like system files)
Click on any file - to see exactly what changed in that file

You can see your changes and decide what belongs together before creating your snapshot.

Staging commands:


$ git add file_name # Stage a specified file/folder

$ git add -u        # Stage all modified files (but not untracked files)

$ git add .         # Stage everything, including new files (be careful!)

$ git restore --staged file_name # If you accidentally add a file, unstages files

$ git rm --cached file_name # Only use if no commits made in this repository

Let’s look at an example where we choose some files to track and others to not track. Then we will modify a tracked file and see what Git thinks about our current changes:

Staging commands in practice. You can pause the video and copy/paste any commands directly from the terminal.

Checking git status as we go tells us several important things:

“Untracked files”: Git sees all the files in the repository but is not tracking any yet
“Changes to be committed” - Git notices main.py and data.csv have been added and are ready to commit
“Changes not staged for commit” - Git has been tracking data.csv as it was added and notices there are changes in the file but these have not been staged. Saving a snapshot (commit) now would only save the previous unmodified version of the file. If an untracked file is modified, Git will not notice any changes.

Now main.py and data.csv are staged (ready to be saved), while our temporary files remain unstaged.

The staging area is Git’s way of letting you craft deliberate, meaningful snapshots rather than chaotic “save everything” moments. Once you get comfortable with this concept, you’ll find it becomes second nature and your project history will tell a much clearer story of your research progress.

Next, we’ll learn how to turn these staged changes into permanent commits that become part of your project’s history.

5. Creating Commits: Saving changes

A commit is Git’s way of taking a permanent snapshot of your project at a specific moment in time. Think of it as creating a bookmark in your project’s story; a point you can always return to, no matter what happens later.

When you create a commit, Git doesn’t just save the files you’ve staged. It creates a complete record that includes:

The exact state of all your staged files
A unique identifier (like a fingerprint) for this specific snapshot
Your name and timestamp showing who made the change and when
A message describing what this snapshot represents

This is fundamentally different from just saving a file. When you save a document, you overwrite the previous version. When you commit in Git, you add a new entry to your project’s permanent history while keeping all previous versions intact.

Creating a Commit

The commit interface in GitHub Desktop keeps everything visual and straightforward:

Review your staged files - confirm the checked files are what you want to commit
Write your commit message in the “Summary” field at the bottom
Add details if helpful in the “Description” field (optional)
Click “Commit to main”

Once you commit, the staged files disappear from the Changes tab - they’re now part of your project’s permanent history. Any unstaged files will remain in the Changes tab.

Creating a commit from the command line is straightforward once your files are staged:

# Create a commit with a message
$ git commit -m "Initial analysis script and extra line of data added"

[main (root-commit) f7fde3a] Initial analysis script and extra line of data added
 4 files changed, 8 insertions(+)
 create mode 100644 analysis.py
 create mode 100644 data.csv
 create mode 100644 notes.md
 create mode 100644 results.txt

# Check the result
$ git status
On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
        .DS_Store

nothing added to commit but untracked files present
Understanding the commit output:

[main (root-commit) f7fde3a] - This is your first commit on the main branch
4 files changed, 8 insertions(+)

The previously staged files no longer appear under “Changes to be committed”

Tip

Get in the habit of running git status (or checking the Changes tab in GitHub Desktop) before and after every Git command. It keeps you aware of what’s happening and prevents surprises.

The Commit message is your opportunity to explain what this snapshot represents and why it matters. Later on, when you’re trying to understand your project’s evolution, these messages become invaluable breadcrumbs.

Effective commit messages follow a simple pattern:

Start with an action verb - “Add”, “Fix”, “Update”, “Remove”, “Refactor”
Be specific about what changed - not just that something changed
Keep the first line concise - aim for 50 characters or less
Focus on the “what” and “why” - the code shows the “how”

Examples that tell a story:

“Added correlation analysis for housing price data”
“Fixed missing value handling in data cleaning step”
“Updated visualisation colours for accessibility”
Not just “Changes” or “Update stuff” or “Fixed it”

The difference is clear: good messages help you understand your project’s progression, while vague messages leave you guessing.

Before You Commit: Quick Checklist

Check your staged files - Use git status or review GitHub Desktop’s Changes tab. Are you committing exactly what you intend?

Avoid large files - Git works best with scripts and small binaries. Large datasets or high-resolution images (>100MB) can bloat your repository especially if publishing the project to GitHub. Consider if these files belong in Git and require tracking.

Review your changes (optional) - Click on files in GitHub Desktop or use git diff --staged to see exactly what changes you’re about to commit. Catching mistakes now is easier than fixing them later.

Your Project’s Story: Viewing Commit History

Now that you’ve created your first commit, you’ve started writing your project’s permanent story. Each commit becomes a bookmark that documents not just what changed from the previous commit, but when and why it changed.

This history is a powerful tool that lets you understand how your research evolved, track down when specific changes were made, and see the logical progression of your work.

Click the “History” tab to see your project’s timeline:

GitHub Desktop history showing each commit, the files changed and changes in each file

Each entry shows: - The commit message - your description of what this snapshot represents - When it was created - timestamp showing your project’s progression - Your name - useful when collaborating with others - A unique identifier - like a fingerprint for this exact snapshot

Click on any commit to see exactly what files were changed and how they were modified. This allows you to review the differences in each commit to the previous.

Git provides several ways to explore your project’s evolution:

# See your commit history
$ git log --oneline
f7fde3a Initial analysis script and extra line of data added

# More detailed view
$ git log
commit f7fde3a8b2c1d5e6789012345678901234567890
Author: demo_researcher <demo@research.com>
Date:   Fri Aug 01 14:30:22 2025 -0500

    Initial analysis script and extra line of data added

# See what actually changed in the most recent commit
$ git show

Creating a commit and inspecting the log. You can pause the video and copy/paste any commands directly from the terminal.

6. Connecting to GitHub: From Local to Public

Understanding GitHub’s Role

GitHub is essentially a hosting service for Git repositories, like a public library where you can store and share your research projects. While Git tracks your changes locally on your computer, GitHub provides a remote location where those same Git repositories can live online.

This connection transforms your personal version control system into a collaborative platform. Your local Git repository remains the primary workspace where you make changes and create commits, but GitHub becomes the public version that others can access, reference, and build upon.

Your local repository contains your actual work and Git history, while GitHub hosts an identical copy that stays synchronised through “pushing” (uploading) and “pulling” (downloading) changes.

Creating Your GitHub Repository

When you create a GitHub repository, you’re establishing the online counterpart to your local Git repository. This process involves creating the remote storage space and then connecting your local repository to it.

GitHub Desktop streamlines this entire process:

Ensure your repository is selected in GitHub Desktop
Click “Publish repository” in the toolbar
Configure the repository settings:
- Name: Usually matches your local folder name
- Description: Brief explanation of the research
- Keep code private: Leave unchecked for public repository
Click “Publish Repository”

GitHub Desktop creates the online repository and establishes the connection automatically. Your local commits are immediately uploaded to GitHub.

The command line approach requires two steps: creating the GitHub repository through the web interface, then connecting your local repository to it.

First, create the repository on GitHub:

Visit github.com and sign in
Click the “+” icon and select “New repository”
Configure the repository:
- Repository name: Match your local folder name
- Description: Briefly summarise the project
- Set to Public
- “Initialise with README” (Check or leave unchecked, we will cover writing a README later in more detail)
Click “Create repository”

Then connect your local repository:

# Add GitHub as a remote repository
$ git remote add origin https://github.com/yourusername/my-research-project.git

# Verify the connection
$ git remote -v
origin  https://github.com/yourusername/my-research-project.git (fetch)
origin  https://github.com/yourusername/my-research-project.git (push)

# Upload your commits to GitHub
$ git push -u origin main
Enumerating objects: 8, done.
Counting objects: 100% (8/8), done.
Writing objects: 100% (8/8), 1.15 KiB | 1.15 MiB/s, done.
To https://github.com/yourusername/my-research-project.git
 * [new branch]      main -> main
Branch 'main' set up to track remote branch 'main' from 'origin'.

Understanding the Remote Connection The git remote add command establishes a connection between your local repository and GitHub, using “origin” as a convenient shortcut name for the full GitHub URL. This is Git’s way of saying “when I refer to ‘origin’, I mean this specific GitHub repository.”

The first git push -u origin main command does two important things: it uploads all your local commits to GitHub, and it sets up “tracking” so that your local main branch knows it corresponds to the main branch on GitHub. After this initial setup, future pushes require only git push.

Important

Authentication requirement: GitHub requires personal access tokens instead of passwords for command line access. If prompted for a password during git push, you’ll need to create a personal access token in your GitHub settings.

Your Repository Goes Public Once the connection is established and your first push completes, your research becomes publicly accessible. Anyone can visit your repository URL, browse your files, and see your complete commit history.

This public visibility means your work immediately gains several practical benefits: it’s automatically backed up in the cloud, accessible from any internet-connected device, and permanently citable with its GitHub URL. Other researchers can examine your methodology, reproduce your analysis, and reference your work in their own research.

The repository URL follows a simple pattern: github.com/yourusername/repositoryname. This becomes the permanent address for your research project.

Maintaining the Connection From this point forward, your local Git workflow remains unchanged - you continue staging and committing changes as before. When you are ready, you push your commits to GitHub.

After making new commits locally, you’ll see a “Push origin” button appear. Clicking this uploads your recent commits to GitHub.

After creating new commits, use git push to upload them to GitHub:

# Check what needs to be pushed
$ git status
Your branch is ahead of 'origin/main' by 2 commits. # This means there are local changes compared to the online repository

# Push your new commits
$ git push

Your local repository remains the primary workspace, but GitHub now serves as both backup and public platform for your research. This dual setup - local for active work, remote for sharing and backup - forms the foundation of modern collaborative research workflows.

Tip

Get into the habit of pushing your commits to GitHub regularly. This ensures your public repository stays current and provides continuous backup of your work.

7. Essential Files for Public Repositories

Making Your Repository Professional

When you publish research code on GitHub, certain files transform it from a simple code dump into a professional, usable resource. These files serve as documentation that helps others understand, run, and properly credit your work.

The essential components are straightforward:

README.md that explains your project and how to use it
A .gitignore that disables tracking of chosen files or filetypes
A licence that clarifies usage rights
Dependency information that enables reproducibility.

Core Repository Files

Your README.md (this is markdown format) serves as the front page of your repository. It should quickly answer what your work does, how others can use it, and how to contact you.

Essential sections for research:

# Project Title

Brief description of your work and approach.

## Getting Started
- Prerequisites needed
- How to run the analysis

## Project Structure
Brief explanation of key files and folders

## Citation
How others should cite your work

## Contact
Your name, email, and institution

The key is clarity and brevity: visitors should understand your project within 30 seconds of reading. For markdown formatting help and adding items like images or graphs please see this link: https://github.com/darsaveli/Readme-Markdown-Syntax

This file prevents sensitive or irrelevant files from being tracked by Git. Essential for protecting personal data and keeping repositories clean.

Common exclusions:

.gitignore.txt

System files

.DS_Store Thumbs.db

Language-specific

pycache/ # Python .Rproj.user/ # R

Sensitive data

*.key .env config.ini

Large files

*.zip data/raw/ # If datasets are large

Remember: .gitignore only affects new files. Previously tracked files need manual removal. For more detail: https://www.freecodecamp.org/news/gitignore-file-how-to-ignore-files-and-folders-in-git/

A licence clarifies how others can use your work. For most research, permissive licences work well. MIT Licence (recommended for research code):

MIT License

[Standard MIT licence text]

Visit https://choosealicense.com/ for guidance on selecting appropriate licences.

Document what software others need to run your analysis. This enables reproducibility.

You could create a requirements.txt file that someone can use to install all required software at once
Include detail on the dependencies or third party software that may need to be installed and how
Include specific version numbers to ensure reproducibility - the more information the better

Adding these files to your repository

Create each file in your project folder using a text editor Return to GitHub Desktop - new files appear in the Changes tab Stage the files you want to commit (check the boxes) Write a descriptive commit message Commit and push to update your GitHub repository

# Create files in your project directory
$ touch README.md .gitignore LICENSE

# Edit files with your preferred editor
$ nano README.md

# Stage and commit new files
$ git add README.md .gitignore LICENSE
$ git commit -m "Add essential documentation files"

# Push to GitHub
$ git push

Important

Before pushing: Review your .gitignore file and ensure no large files or sensitive information (passwords, personal data, API keys) are tracked in your repository.

You don’t need perfect documentation immediately. Start with basic versions of these files and improve them as your project evolves.

8. Basic Collaboration: Using Others’ Work

Cloning: getting a complete copy

When you clone a repository, Git downloads everything: all files, all commits, all branches, and sets up a connection to the original repository for future updates.

Find the repository on GitHub you want to clone
Click the green “Code” button and select “Open with GitHub Desktop”
Choose a local folder for the repository
Click “Clone”

GitHub Desktop downloads the complete repository and establishes the connection automatically.

Alternatively, use File → Clone Repository in GitHub Desktop and paste the repository URL.

# Navigate to your preferred location
$ cd ~/Documents/research-projects

# Clone the repository
$ git clone https://github.com/researcher/interesting-project.git

Cloning into 'interesting-project'...
remote: Counting objects: 127, done.
Receiving objects: 100% (127/127), 2.85 MiB | 4.12 MiB/s, done.

# Move into the cloned repository
$ cd interesting-project

# Verify the clone worked
$ git status
On branch main
Your branch is up to date with 'origin/main'.

The cloned repository is now a fully functional Git repository on your computer. You can explore the files, examine the commit history, and understand how the research was developed.

Staying current: pulling updates

Research is ongoing, and the original authors may continue updating their repository. Since your clone maintains a connection to the original, you can download these updates whenever they occur. This is different from cloning - pulling updates an existing local repository rather than creating a new one.

Select the cloned repository in GitHub Desktop Click “Fetch origin” to check for updates If updates exist, click “Pull origin” to download them

GitHub Desktop shows you exactly what changed in the update, maintaining your awareness of how the project evolves.

# Check for updates (doesn't download them yet)
$ git fetch

# See what's new
$ git log HEAD..origin/main --oneline
x9y8z7w Add new visualisation options
v6u5t4s Fix data preprocessing bug

# Download the updates
$ git pull
Updating a1b2c3d..x9y8z7w
Fast-forward
 analysis.py | 15 +++++++++++++--
 README.md   |  3 +++
 2 files changed, 16 insertions(+), 2 deletions(-)

Practical Workflow for Using Others’ Research

Before cloning: Read the repository’s README and licence to understand what the project does and how you can use it.
After cloning: Install any required dependencies and run the provided examples to ensure everything works in your environment.
For ongoing use: Periodically pull updates to benefit from improvements and bug fixes the original authors make.
When building on others’ work: Always provide appropriate citation and consider reaching out to collaborate rather than working in isolation.

Important

Respecting licences: Check the repository’s licence file to understand what you can legally do with the code. Most research repositories use permissive licences, but always verify.

Understanding Your Boundaries This basic collaboration workflow focuses on using others’ work rather than contributing back to it. You can clone any public repository, explore its contents, and pull updates as they become available.

More advanced collaboration, such as suggesting changes back to the original authors or handling conflicts when multiple people modify the same files, involves additional Git concepts we haven’t covered here. See this link for further Git actions: The Turing Way’s Guide to Version Control

The ability to easily clone and update others’ work and history transforms research from isolated efforts into a connected ecosystem where you can build upon the best available tools and methods.

9. Common Issues and Quick Fixes

Diagnosing Problems

When Git behaves unexpectedly, start with these diagnostic commands to understand what’s happening:

# Check current repository state
$ git status

# See recent commits
$ git log --oneline -5

# Check remote connections
$ git remote -v

# See what changes are staged
$ git diff --staged

These commands reveal most issues quickly and help you understand the current state before attempting fixes.

Quick Reference for Common Issues

Problem: Committed files you didn’t intend to include Before pushing:

Command line

# Undo last commit, keep changes
$ git reset HEAD~1

Re-stage only desired files, then commit again

GitHub Desktop: History tab → right-click commit → “Undo this commit” Prevention: Always use git status before committing

Problem: “File exceeds 100MB limit” when pushing to GitHub Solution:

# Remove large file from tracking
$ git rm --cached large-file.csv

Add to .gitignore

$ echo "large-file.csv" >> .gitignore
$ git commit -m "Remove large file from tracking"

Check file sizes before committing:

$ find . -size +50M -type f

For ongoing large file needs: Consider Git LFS or external storage solutions https://git-lfs.com/

Problem: “Updates were rejected” or “fetch first” Solution:

Command line

# Pull remote changes first
$ git pull

# Then push your changes
$ git push

GitHub Desktop: Click “Pull origin” then “Push origin” Prevention: Push changes regularly, pull before starting work sessions

Problem: Added wrong files to staging area Solution:

# Unstage specific file
$ git restore --staged filename.py

# Unstage all files
$ git restore --staged .

GitHub Desktop: Uncheck the boxes next to files in Changes tab

Problem: Committed passwords, API keys, or personal data If not yet pushed:

$ git reset HEAD~1
# Remove sensitive files, add to .gitignore, recommit

If already pushed: Stop immediately and seek expert help - this requires history rewriting

Prevention

Use .gitignore from project start, review staged changes before committing

Exploring Repository History

Understanding your repository’s history helps diagnose issues and find solutions:

# See all commits with details
$ git log

# Compact view of recent commits  
$ git log --oneline -10

# See what changed in a specific commit
$ git show commit-hash

# View all past actions (including "lost" commits)
$ git reflog

GitHub Desktop: Use the History tab to visually explore commits and changes.

Recovery Options If you’ve lost work or made mistakes:

# Check for stashed changes
$ git stash list

# Recover stashed work
$ git stash pop

# Go back to a previous commit (temporarily)
$ git checkout commit-hash
# Return to latest: git checkout main

# See if "lost" work exists in reflog
$ git reflog

Warning

Avoid destructive commands: Be cautious with commands containing –hard, –force, or that mention “destructive” in warnings. When in doubt, ask for help rather than risk losing work.

Getting Help

Modern AI assistance: AI chatbots (ChatGPT, Claude, GitHub Copilot) are excellent for Git troubleshooting. Copy your exact error message and ask for help, but always understand the suggested commands before running them.
GitHub documentation: Comprehensive official guides
Your institution’s IT support: Often have Git expertise
Git documentation locally: git help <command> for detailed command information

Tip

Error message strategy: Git’s error messages are usually helpful. Read them carefully as they often suggest the exact commands needed to fix the problem.

Further resources for continued learning

In-depth tutorials:

The Turing Way’s Guide to Version Control : Academic perspective on reproducible research
Software Carpentry Git Lessons: Step-by-step lessons with exercises
Pro Git Book : Comprehensive Git reference (free online)

Tools and utilities:

GitHub Desktop - GUI for Git operations
GitKraken - Advanced Git GUI with visualisation
GitHub CLI - Command-line tools for GitHub integration

Downloadable cheat sheet for Git/GitHub

[DOWNLOADABLE CHEAT SHEET PLACEHOLDER: PDF reference card with common commands, flags, and troubleshooting tips]

Summary

You now have the essential skills to manage research projects with Git and share your work professionally on GitHub. This foundation - staging, committing, pushing, and basic collaboration - handles the vast majority of research version control needs.

What we’ve covered

Personal version control: You can track changes to your research, create meaningful snapshots of your progress, and recover from mistakes without losing work.

Professional sharing: Your repositories now include proper documentation, clear commit histories, and appropriate licensing that makes your work discoverable and citable.

Basic collaboration: You can find and use others’ research repositories, keep them updated, and understand the foundations of collaborative workflows.

Problem-solving confidence: You know how to diagnose common issues, apply quick fixes, and when to seek help rather than risk losing work.

Building Confidence with Git

Git is designed to prevent data loss. Almost every mistake can be undone, and your commit history provides multiple recovery points. When encountering errors:

Read the error message carefully
Use diagnostic commands to understand the current state
Try the simplest solution first
Seek help for unfamiliar or destructive-looking commands

Each problem you resolve builds your understanding and confidence with the system.

Your Contribution to Open Science

Every time you share research openly, you contribute to more transparent and collaborative science. Your repositories might help another researcher solve a critical problem, get adapted for use in a different field, or become the foundation for breakthrough discoveries.

Git and GitHub aren’t just tools; they’re part of modern research infrastructure that enables reproducible, collaborative science.

Only the beginning…

Start with what you’ve learned here. As your research needs grow, you can explore advanced Git features like branching, detailed collaboration workflows, and specialised tools. The foundation you’ve built provides a solid base for any future Git learning.

Your research deserves to be shared with the world, and you now have the skills to do it professionally and confidently.

← Back to Homepage || Principles of Good Code Writing || Guide to Sample Size Calculations

About This Guide - Author: Ritvik Bharamgoudar - Part of the Research Skills Toolkit - Funded by Cardiff University Research Culture Fund - View on GitHub || Report Issues

Last updated: [01 Aug 2025] | Licensed under MIT

About This Guide

1. What and why?

Why should researchers care?

Common fears (and why they’re wrong)

What We’ll Cover

2. Initial setup

2.1 Creating your GitHub account

2.2 Choosing Your Tools

2.3 Setting Up Git

2.4 The Basic Git Workflow

3. Your First Repository

Creating Your Repository

Understanding Git’s View of Your Files

4. Staging Files: Choosing what to track and save

Understanding the Staging Area

Staging Files in Practice

5. Creating Commits: Saving changes

Creating a Commit

Your Project’s Story: Viewing Commit History

6. Connecting to GitHub: From Local to Public

Understanding GitHub’s Role

Creating Your GitHub Repository

7. Essential Files for Public Repositories

Making Your Repository Professional

Core Repository Files

System files

Language-specific

Sensitive data

Large files

8. Basic Collaboration: Using Others’ Work

Understanding repository sharing

Cloning: getting a complete copy

Staying current: pulling updates

9. Common Issues and Quick Fixes

Diagnosing Problems

Quick Reference for Common Issues

Exploring Repository History

Getting Help

Further resources for continued learning

Downloadable cheat sheet for Git/GitHub

Summary