Working with `git` and GitHub in Data Science

How to set up, configure, and work with git and GitHub in the practice of data science.

What is `git`?

Git is a distributed version control system that tracks changes in any set of computer files, usually used for coordinating work among programmers who are collaboratively developing source code during software development. Its goals include speed, data integrity, and support for distributed, non-linear workflows. ^Wikipedia

What is GitHub?

GitHub is a developer platform that allows developers to create, store, manage and share their code. It uses Git software, providing the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous integration, and wikis for every project. It currently hosts work by approximately 100M developers. ^Wikipedia

Git & GitHub in Data Science

Data aggregation, cleaning, pipelines and ML models all rely on software in order to operate. Responsible software management depends on well-managed code, versioning, prioritizing bugs, features, and user issues. Further, modern platforms and infrastructure tend to favor code-driven tests, builds, deployment, and management.

All of which is to say: Code is fundamental to our work, and it would be both risky and impractical to not use source control.

Setup
- Install and set up git
- Authenticate git to GitHub
- Basic configuration
- Troubleshooting
Creating and managing a repository
- Create a repository locally
- Create a repository in GitHub
- Adding or removing collaborators
Source control basics
- Diff
- Status
- Add
- Commit
- Push/Pull
- Fetch
- Log
Branches, Forks, and Merges
- Branches
- Forks
- Fetch from Upstream
- Merges and Pull Requests
Issues
Advanced Git/GitHub Features
- Stash
- Signing commits
- Reset and Revert
- Rebase
- Cherry-pick
- Renaming origin
- Bonus
GitHub Actions
- About
- Credentials & Secrets
- Example 1 - Build software upon a push
- Example 2 - Build and deploy a container

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
.github/workflows		.github/workflows
_assets		_assets
_includes		_includes
_layouts		_layouts
_sass		_sass
assets		assets
bin		bin
docs		docs
fixtures		fixtures
lib/tasks		lib/tasks
.gitignore		.gitignore
404.html		404.html
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
Rakefile		Rakefile
_config.yml		_config.yml
docker-compose.yml		docker-compose.yml
favicon.ico		favicon.ico
index.md		index.md
just-the-docs.gemspec		just-the-docs.gemspec
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Working with `git` and GitHub in Data Science

What is `git`?

What is GitHub?

Git & GitHub in Data Science

Contents

About

Languages

UVADS/git-basics

Folders and files

Latest commit

History

Repository files navigation

Working with git and GitHub in Data Science

What is git?

What is GitHub?

Git & GitHub in Data Science

Contents

About

Topics

Resources

Stars

Watchers

Forks

Languages

Working with `git` and GitHub in Data Science

What is `git`?