diff --git a/97-git.Rmd b/97-git.Rmd index 27019bd..799f7ba 100644 --- a/97-git.Rmd +++ b/97-git.Rmd @@ -9,7 +9,7 @@ Git](https://swcarpentry.github.io/git-novice). In the past, students have reported the difficulty of collaborative work on R code and Rmd files. Each student work with RStudio on their local computer, and share the code, report and data files on shared -drive and/or send updates by email. Such a situaton may lead to a well +drive and/or send updates by email. Such a situation may lead to a well known situation, nicely caught by the 'notFinal.doc' PhD Comic: @@ -22,7 +22,7 @@ control', used by data scientists, bioinformaticians and programmers around to world to keep track of changes in code, Rmd reports, or any other files, and efficiently collaborate among large and small teams. -Although there exist many other verison control software, we will +Although there exist many other version control software, we will focus on [Git](https://git-scm.com/) and [GitHub](https://github.com/), and these two are widely used. @@ -81,17 +81,17 @@ for several people to work in parallel on the same set of files. Git is the automated version control software that we will be using. GitHub is a web interface to Git that allows to share a version -control project over the internet, facilitates some operations to work +control project over the interned, facilitates some operations to work collaboratively on-line and enables discussions (i.e. issues). Let's start by explaining some fundamental Git concepts. ### Git {-} In Git, the directory that contains all the files that need to be -tracked/version controlled is called a *repository*. This would be -equivalent to a RStudio project that one sets up before starting a new -analysis. And as a matter of fact, later, we will use Git to version -control an RStudio project. +tracked/version controlled is called a *repository* (often shortened +to repo). This would be equivalent to a RStudio project that one sets +up before starting a new analysis. And as a matter of fact, later, we +will use Git to version control an RStudio project. When starting a new project that needs to be version controlled, the (typically) local directory needs to be *initiated* to use Git. Then @@ -123,7 +123,7 @@ but only the differences between each file. Knowing the difference between the current and previous states of a file is enough to reconstruct the previous version(s). -```{r git2, fig.cap="Version control with Git: modifiying files.", echo=FALSE, purl=FALSE, out.width='100%', fig.align='center'} +```{r git2, fig.cap="Version control with Git: modifying files.", echo=FALSE, purl=FALSE, out.width='100%', fig.align='center'} knitr::include_graphics("figs/git2.png") ``` @@ -153,14 +153,14 @@ others, and GitHub allows this. The repository that we created above is a *local* repository, as it lives on a user's local computer. Repositories can also live -elsewhere, on another user's computer, or a server; such a repositories -are called a *remote* repositories. +elsewhere, on another user's computer, or a server; such a +repositories are called a *remote* repositories. -Let's now start from an exisiting repository on GitHub. Creating a +Let's now start from an existing repository on GitHub. Creating a local copy of that repository and its content is done by an operation called *cloning*. Different users can of course clone the same remote. -At the time of cloning, the content of the local and remote repositories -are identical. +At the time of cloning, the content of the local and remote +repositories are identical. The creation, modification and deletion of files is as described above: the user interacts with the files in their working directory @@ -178,9 +178,9 @@ pushing are handled automatically by Git, except in case of conflict. A conflict happens when two users change the same part of a file: -- Alice and Bastien pull the latest changes and modify the empty `file1` - by adding their name at the beginning of the file. Alice's file - looks like this in her local repository: +- Alice and Bastien pull the latest changes and modify the empty + `file1` by adding their name at the beginning of the file. Alice's + file looks like this in her local repository: ``` Alice @@ -193,8 +193,9 @@ Bastien ``` - Alice commits locally and pushes to the remote repository. `file1` - in the remote and in Alice's local repo are now identical, and Git - recorded that the change was adding a new line containing `Alice`. + in the remote and in Alice's local repository are now identical, and + Git recorded that the change was adding a new line containing + `Alice`. - Bastien now also tries to push but gets an error because he pushes a file that contains a different first line that would override Alice's commit. His update are rejected because the remote contains @@ -215,7 +216,7 @@ Alice and Bastien `r msmbstyle::question_begin()` 1. Create an account on [GitHub](https://github.com/). Choose your -username wisely as you might want to reuse it later and will have to +user name wisely as you might want to reuse it later and will have to share it. 2. Create your own repository and open an issue on it. @@ -229,7 +230,7 @@ You have now been added to the repository. 1. Reply to the first issue to verify that you're able to access the - repo. + repository. 2. Create a new issue where you make a link to a file or a specific line of a file and assign it to yourself. @@ -257,12 +258,6 @@ Git needs to be install first on your computer for RStudio to use it. Installation is different depending on the operating system of your computer. - - - - - - #### Windows users {-} The easiest way is to install [Git for @@ -302,7 +297,7 @@ the following command will install `git`: sudo apt install git ``` -### Connect RStudio to your GitHub repo +### Connect RStudio to your GitHub repository To connect RStudio to a GitHub repository, a personal access token (PAT) is needed. This will act as an identifier for a specific GitHub @@ -321,8 +316,8 @@ Personal* *access tokens > Tokens (classic)* or directly it to no expiration if you see fit. We advise to put the expiration date to at least the end of the semester, in case you want to use it for your projects. -- What this PAT allows to do, it is recommended to select repo, user, - gist and workflow. +- What this PAT allows to do, it is recommended to select repository, + user, gist and workflow. Once you've generated the token, it will appear. Be careful, it is the only time you'll be able to see this PAT so it is advised to copy it @@ -333,14 +328,14 @@ public. Once a PAT has been created, you can clone any GitHub repository on your local computer using RStudio. The easiest way of working with GitHub and RStudio is to have a GitHub repository first. The only -thing needed from GitHub is the cloning https address of your repo. To -get it, go the repo page, click the big green button that says "<> -Code" and copy the HTTPS URL address. +thing needed from GitHub is the cloning https address of your +repository. To get it, go the repository page, click the big green +button that says "<> Code" and copy the HTTPS URL address. -To clone your GitHub repo on your computer, open RStudio and create a -new project. You'll then be able to chose to create a Version Control -project and choose Git. Then you just need to paste the HTTPS URL of -your GitHub repo and it will be cloned as an R project. +To clone your GitHub repository on your computer, open RStudio and +create a new project. You'll then be able to chose to create a Version +Control project and choose Git. Then you just need to paste the HTTPS +URL of your GitHub repository and it will be cloned as an R project. The project will open and a new Git tab will appear in the "environment" pane of RStudio. This is where you'll be able to manage @@ -349,7 +344,7 @@ modified from the GitHub version. Once a file is modified, you can click on "Diff" to open a new window that will show you what has been changed in the selected file. You can then check the staged box, write a commit message and commit these changes. You then click push to send -all that to GitHub. That's when you'll be asked to give your username +all that to GitHub. That's when you'll be asked to give your user name and PAT. RStudio locally tends to remember the PAT so you shouldn't have to put it again. @@ -364,7 +359,7 @@ not pushed to GitHub. This should usually contain your .Rproj but also all files bigger than 50 Mb as these cannot be pushed to GitHub. RStudio also allows you to have a look at the commit history, if you -want to see all the changes that have been done to the repo. +want to see all the changes that have been done to the repository. `r msmbstyle::question_begin()` @@ -374,19 +369,19 @@ want to see all the changes that have been done to the repo. 2. Create a Rmd file locally on RStudio then stage, commit and push it. -3. Modify that Rmd and knit it. When commiting, have a look at how the +3. Modify that Rmd and knit it. When committing, have a look at how the modification are showed to you. Push it on GitHub and go see the compiled result online. Also have a look at the commit online. -4. Modify the readme file on GitHub and pull it locally. +4. Modify the README file on GitHub and pull it locally. `r msmbstyle::question_end()` ## Git and command line -It is still interesting to know that, usually, all these things are done through -the command line. Here is how to do it : +It is still interesting to know that, usually, all these things are +done through the command line. Here is how to do it : - Initialise a local repository @@ -440,14 +435,14 @@ Options > Terminal > New* *terminals open with* This exercise will illustrate a merge conflict. To do so, work in pairs (called Alice and Bastien below). -1. Alice creates a GitHub repository and adds `file1` to the repo, and +1. Alice creates a GitHub repository and adds `file1` to the repository, and adds Bastien as a collaborator with write access: Settings > Collaborators (in the left panel) > Add people > Search and add a GitHub user. 2. Both Alice and Bastien clone to remote repository and add their names to `file1`. -3. Alices commits her local changes and pushes them to the GitHub +3. Alice commits her local changes and pushes them to the GitHub repository. 4. Bastien commits his local changes and tries to push them to the GitHub repository. @@ -460,7 +455,7 @@ pairs (called Alice and Bastien below). `r msmbstyle::question_end()` Using Git and Github is mostly a quite sailing on a calm sea. The only -little annoying hickup are conflicts. It is best to avoid these by +little annoying hick-up are conflicts. It is best to avoid these by coordinate work by keep local and remote repositories in sync: - The easiest way to avoid conflicts is to always be in sync with the @@ -480,7 +475,7 @@ project: independently of the whole pipeline; - given that Git compares documents line by line to assess if it can merge them automatically, use shorter lines (i.e. split your - sentences and paragraphes over more lines) to reduce the risk of + sentences and paragraphs over more lines) to reduce the risk of conflicts. GitHub issues are a useful way to discuss any specific points and @@ -515,10 +510,10 @@ repository. `Alice/WSBIM2122-GitHub-training`. - She could now send a pull request to merge changes from `Alice/WSBIM2122-GitHub-training` back into - `UCLouvain-CBIO/WSBIM2122-GitHub-training`. The PR can be inspected - and reviewed and merged by a member of the + `UCLouvain-CBIO/WSBIM2122-GitHub-training`. The pull request can be + inspected and reviewed and merged by a member of the `UCLouvain-CBIO/WSBIM2122-GitHub-training`. -- Pull requests (of shortened PR) are also useful when multiple +- Pull requests (often shortened PR) are also useful when multiple members of the same repository want their contributions to be reviewed by other team members. Using PRs within a team is useful so as to keep everybody informed about the changes different members