Skip to content

Commit

Permalink
Add note about backups in the documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
ml-evs committed Jan 18, 2024
1 parent 5c67438 commit a65d677
Showing 1 changed file with 28 additions and 0 deletions.
28 changes: 28 additions & 0 deletions pydatalab/docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ Currently, there are two mechanisms for accessing remote files:
1. You can mount the filesystem locally and provide the path in your datalab config file. For example, for Cambridge Chemistry users, you will have to (connect to the ChemNet VPN and) mount the Grey Group backup servers on your local machine, then define these folders in your config.
2. Access over SSH: alternatively, you can set up passwordless `ssh` access to a machine (e.g., using `citadel` as a proxy jump), and paths on that remote machine can be configured as separate filesystems. The filesystem metadata will be synced periodically, and any files attached in `datalab` will be downloaded and stored locally on the `pydatalab` server (with the file being kept younger than 1 hour old on each access).


## General Server administration

Currently most administration tasks must be handled directly inside the Python API container.
Expand All @@ -86,6 +87,33 @@ It relies on the Excel export feature of ChemInventory and is achieved with `inv
If a future export is made and reimported, the old entries will be kept and updated, rather than overwritten.
*datalab* currently has no functionality for chemical inventory management itself; if you wish to support importing from another inventory system, please [raise an issue](https://github.com/the-grey-group/datalab/issues/new).

### Backups

*datalab* provides a way to configure and create a snapshot backups of the database and filestore.
The option [`BACKUP_STRATEGIES`][pydatalab.config.ServerConfig.BACKUP_STRATEGIES] allows you to list strategies for scheduled backups, with their frequency, storage location (can be local or remote) and retention.
These backups are only performed when scheduled externally (e.g., via `cron` on the hosting server), or when triggered manually using the `invoke admin.create-backup` task.

The simplest way to create a backup is to run `invoke admin.create-backup --output-path /tmp/backup.tar.gz`, which will create a compressed backup.
This should be run from the server or container for the API, and will make use of the config to connect to the database and file store.
This approach will not follow any retention strategy.

Alternatively, you can create a backup given the strategy name defined in the server config, using the same task:

```
invoke admin.create-backup --strategy-name daily-snapshots
```

This will apply the retention strategy and any copying to remote resources as configured.

When scheduling backups externally, it is recommended you do not use `cron` inside the server Docker container.
Instead, you could schedule a job that calls, for example:

```shell
docker compose exec api pipenv run admin.create-backup --strategy-name daily-snapshots
```

In the future, this may be integrated directly into the *datalab* server using a Python-based scheduler.

## Config API Reference

::: pydatalab.config.ServerConfig
Expand Down

0 comments on commit a65d677

Please sign in to comment.