To clone a github repository in JupyterLab, we need to use the command line/Terminal. To open the Terminal, click the +
button in the top left and click on Terminal. You should see a command line prompt.
Before we clone the repository, it is worth knowing some basic commands which will allow you to navigate your directories and files in JupyterLab.
Use the commands below to create a folder/directory called training
. Inside this directory, create a file called main.py
.
mkdir <directory_name>
: create a new directory/foldercd <directory_name>
: navigate into a directorytouch <file_name>
: create a filels
: list files in a directorypwd
: show the current directory
Let's practice moving and deleting files. Use the commands below to move the main.py
file to your home directory. Navigate back to the home directory and then remove the main.py
file. Note that ~
can be used to refer to your home directory, and ..
can be used to refer to a directory one step back/up from the one you are in.
rm <filename>
: delete file(s)cp <filename> <new_location>
: copy a file from current location to a new onemv <filename> <new_location>
: move a file from current location to a new one
To delete a directory use the -rf
option with the rm
command. Be very careful with this!
rm -rf training
Clone this git repository! You can clone this repository using git clone [email protected]:moj-analytical-services/intro-to-python.git
1. Navigate into your newly cloned repository with the cd
command. Tip Start typing the name of a file or directory and pressing the tab key will autocomplete the name.
Create a new branch to work on, using the command git checkout -b <your_branch_name>
(name the branch with your name or initials, or something unique!). We'll revisit some other git commands later in the training.
- Note: you may get a message when you clone the repo "The authenticity of host 'github.com (140.82.121.4)' can't be established. ECDSA key fingerprint is … Are you sure you want to continue connecting (yes/no/fingerprint)?’". This is normal, just type yes and hit return.
git clone [email protected]:moj-analytical-services/intro-to-python.git
cd intro-to-python
git checkout -b my_branch
If you search for bash
or Linux command line
you can find tutorials giving more details about the terminal. The Analytical Platform uses Ubuntu Linux so a good place to start might be their tutorial.
If this course is time limited we can use uv to speed things up so we don't spend half the course waiting for things to install. This is a tool that greatly speeds up Python packaging tasks but is not (yet?) the recommended way of doing things. The recommended way of doing things will also be described.
To install uv
run
curl -LsSf https://astral.sh/uv/install.sh | sh
and
source $HOME/.cargo/env
Virtual environments keep your projects separate, so you don't have clashes between package versions. You may want to refer to this guidance on how to create them and use them in JupyterLab. You will create virtual environments on the command line.
To create a virtual environment, within your project directory type
uv venv venv
or
python -m venv venv
This can then be activated with
source venv/bin/activate
The virtual environment can be deactivated with the command
deactivate
but don't do that just yet.
Tip If your venv
is messed up you can remove it with rm -rf venv
and start again.
Packages make your life easier when coding in python. You can use them to do things that would be very time consuming to do in base python, so it is worth understanding how to install them early on.
Within this repo is an existing file requirements.txt
which gives details on packages that should be installed so that everyone using the project is using the same environment.
Install packages from the requirements file with
uv pip install -r requirements.txt
or
python -m pip install -r requirements.txt
If this fails run
uv pip install pandas matplotlib requests smart_open pydbtools
or
pip install pandas matplotlib requests smart_open pydbtools
The requirements file included pandas
, a data analysis package which you are likely to use a lot when coding in python, and a few other useful packages. However to access files on the Analytical Platform we need an additional package, s3fs
.
We use pip
via the command line to install package using (uv) pip install <package_name>
. Install s3fs
now.
Now we have a virtual environment we have to let Jupyterlab know about it so we can use it in notebooks. To do this we need to install the ipykernel
package.
Install ipykernel
now.
We now need to install the kernel with python -m ipykernel install --name "<short_project_name_without_spaces>" --display-name "<Longer name for display>" --user
.
python -m ipykernel install --name "intro-to-python" --display-name "Python training" --user
You can list your installed kernels with
jupyter kernelspec list
If you're not sure what environment they relate to you can check the kernel.json
file in the listed directory using the cat
command, which displays a file's contents. For example
cat /home/jovyan/.local/share/jupyter/kernels/intro-to-python/kernel.json
If any are no longer needed use jupyter kernelspec uninstall <kernel name>
.
IMPORTANT Jupyterlab notebooks, unlike for example Rmarkdown or Quarto files, store their results by default. This means that if they're pushed to Github it can cause a security breach as they will be permanently in Github's history even if you remove them from your branch. You do not want this to happen as you'll have to purge the file from Github's history and report a security incident, neither of which are fun.
nbstripout
automatically removes results from Jupyter notebooks and should be installed with
uv pip install nbstripout
or
pip install nbstripout
Then run
nbstripout --install
and
nbstripout --install --attributes .gitattributes
Now we have more packages in our environment we can record them with pip freeze
. Run
uv pip freeze > requirements.txt
or
pip freeze > requirements.txt
Now we have made changes to the packages and the git setup we will want to reflect them in our git branch.
git status
will show which files have changed or added together with the current branch. git add <file>
will add the file to the next commit, and git commit -m "<commit message>"
will create the commit. git push
then updates Github with the latest changes. The first time you push to a branch you will need to tell Github about it with git push --set-upstream origin <branch name>
.
Add, commit and push the changes to requirements.txt
and .gitattributes
.
git add requirements.txt .gitattributes
git commit -m "Updated packages, installed nbstripout"
git push --set-upstream origin MY_BRANCH_NAME
There are links to more information about git in the Analytical Platform user guidance.
Using venv
and pip
is the standard way to manage Python projects but is not the only option.
- Poetry is widely used, particularly when developing packages such as pydbtools.
- uv can be used as much more than just a replacement for the built in
pip
andvenv
- There are many others...
In JupyterLab, go to the file navigator on the left of screen and click on intro-to-python
-> Part_2_Python.ipynb
. This should open a jupyter notebook with the next part of this training session's content.