Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add contribution guidance for Tools page #153

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 12 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,23 @@ This document forms part of the Quality Guidance, published by the Quality and I
We welcome all constructive feedback and contributions.

To provide feedback or request new content, you can [create an issue](https://github.com/best-practice-and-impact/qa-of-code-guidance/issues) on this book's repository.
Alternatively, you can always drop us an [email](mailto:Analysis.Function@ons.gov.uk).
Alternatively, you can always drop us an [email](mailto:ASAP@ons.gov.uk).

If you'd like to contribute, please also
[create or comment on an issue](https://github.com/best-practice-and-impact/qa-of-code-guidance/issues)
to describe the changes that you'd like to make.
This will allow discussion around whether content is suitable for this book, before you put the hard work into implementing it.
This will allow discussion around whether content is suitable for the book, before you put the hard work into implementing it.


### Getting started

To start contributing, you'll need python installed.
Minor text edits can be submitted as a Pull Request using the "Suggest edit" button under the GitHub logo at the top of the page you would like to change.

For changes to anything other than lines of text, you should follow these steps to make the changes locally:

To start contributing, you'll need Python installed.
If you sit outside of Quality and Improvement Division, the you'll need to [create a Fork of this repository to make changes](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/working-with-forks).
Once forked, you should clone the fork repository to get a copy of the book. Then install it's Python dependencies like so:
Once forked, you should clone the fork repository to get a copy of the book. Then install its Python dependencies like so:

```{none}
git clone https://github.com/<your-username>/qa-of-code-guidance.git
Expand All @@ -36,15 +40,17 @@ jb build book
```

Jupyter book will write the book's `HTML` content to `book/_build/html/`, so you can open `index.html` from there to view the local build.
Run the build command after making a change to the text to update the HTML that you view here.

All content for the book is currently written in
[Markedly Structured Text](https://myst-parser.readthedocs.io/en/latest/),
which is based on standard Markdown (`.md`) but allows use of "directives" for generating content.

We also require developers to conform to our style guide. You can do this by installing our pre-commit `pymarkdownlnt`:
We also require developers to conform to a specific Markdown style.
You can do this by installing our pre-commit `pymarkdownlnt`:

```{none}
pip install -r dev-dependencies.txt
pip install -r dev-requirements.txt
pre-commit install
```

Expand Down
149 changes: 67 additions & 82 deletions book/tools.md
Original file line number Diff line number Diff line change
@@ -1,67 +1,101 @@
# Tools

This section points you to useful tools that support reproducible analysis.
This section highlights tools that support reproducible analysis and research.
This includes tools for general software development and bespoke packages that have been developed for government analysis.
Those developed or contributed to within government are marked with the abbreviation (gov).

If you have developed a package for use in analysis or recommend any that are not included here, please add them to the list.
You can request a new tool to be added to the list by [creating an issue on GitHub](https://github.com/best-practice-and-impact/qa-of-code-guidance/issues/new/choose)
or [contacting us by email](mailto:[email protected]?subject=Duck%20Book%20Tools).
Alternatively, you can add it directly to the project by creating a Pull Request.
You can do this using the "Suggest edit" link under the GitHub logo at the top of this page.
Please include a link and brief description when requesting a new tool to be added.
rowanhemsi marked this conversation as resolved.
Show resolved Hide resolved

## Testing
The tools included on this page will in general follow the good quality assurance practices described in this guidance.
However, as with any software there is a chance that they may still contain bugs or limitations.
Please apply your own judgement when using them.
If you feel a tool should no longer be included in this list, please suggest an edit or get in touch.

Implementing automated code testing.
## Data manipulation and analysis

Manipulating and analysing data.

### Python

* [pytest](https://docs.pytest.org/en/stable/)
* [hypothesis](https://hypothesis.readthedocs.io/en/latest/)
* [unittest](https://docs.python.org/3/library/unittest.html)
* [tox](https://tox.readthedocs.io/en/latest/)
* [nox](https://nox.thea.codes/en/stable/)
* [chispa (PySpark)](https://pypi.org/project/chispa/)

* [pandas](https://pandas.pydata.org/) - common data analysis and manipulation
* [Polars](https://www.pola.rs/) - high performance data manipulation
* [PySpark](https://spark.apache.org/docs/latest/api/python/) - data manipulation for distributed (large) data
* [Splink](https://moj-analytical-services.github.io/splink/) (gov) - probabilistic data linkage

### R

* [testthat](https://testthat.r-lib.org/)
* [assertr](https://docs.ropensci.org/assertr/)
* [patrick](https://github.com/google/patrick)
* [rhub](https://r-hub.github.io/rhub/)
* [dplyr](https://dplyr.tidyverse.org/) - common data analysis and manipulation
* [sparklyr](https://spark.rstudio.com/) - for distributed (large) data

## Publishing

## Test Coverage
* [Quarto](https://quarto.org/) - reproducible documents for Python and R
* [a11ytables (R only)](https://co-analysis.github.io/a11ytables/index.html) (gov) - creating reproducible, accessible spreadsheets
* [gptables (Python and R)](https://gptables.readthedocs.io/en/latest/index.html) (gov) - creating reproducible, accessible spreadsheets

Identifying which parts of your code are covered by existing tests.
## Testing

Tools for implementing automated code testing.

### Python

* [coverage](https://coverage.readthedocs.io/en/coverage-5.3/)
* [pytest](https://docs.pytest.org/en/stable/) - common testing framework
* [unittest](https://docs.python.org/3/library/unittest.html) - common testing framework
* [hypothesis](https://hypothesis.readthedocs.io/en/latest/) - property-based testing
* [chispa (PySpark)](https://pypi.org/project/chispa/) - helper for testing PySpark code
* [coverage](https://coverage.readthedocs.io/en/coverage-5.3/) - measuring test coverage


### R

* [covr](https://covr.r-lib.org/)
* [testthat](https://testthat.r-lib.org/) - common testing framework
* [assertr](https://docs.ropensci.org/assertr/) - common testing framework
* [patrick](https://github.com/google/patrick) - parameterised testing extension for `testthat`
* [covr](https://covr.r-lib.org/) - measuring test coverage

## Dependency management

## Code Linters
* [venv (Python)](https://docs.python.org/3/library/venv.html) - manage packages using virtual environments
* [pyenv (Python)](https://github.com/pyenv/pyenv) - manage independent Python versions for different projects
* [renv (R)](https://rstudio.github.io/renv/articles/renv.html) - virtual environments for managing packages
* [conda](https://docs.conda.io/en/latest/) - manage language versions and packages for most languages

Analyse code for stylistic errors, and sometimes bugs.
## Version control

* [Git](https://git-scm.com/) - common open source version control system
* [pre-commit](https://pre-commit.com/) - trigger checks (e.g. linters and formatters) before Git commits are created

### Python
## Project templates

* [pylint](https://www.pylint.org/)
* [Pyflakes](https://pypi.org/project/pyflakes/)
* [flake8](https://flake8.pycqa.org/en/latest/)
* [Bandit](https://bandit.readthedocs.io/en/latest/)
* [govcookiecutter (Python)](https://github.com/best-practice-and-impact/govcookiecutter) (gov) - template project for reproducible analysis
* [Rgovcookiecutter (R)](https://github.com/best-practice-and-impact/Rgovcookiecutter) (gov) - template project for reproducible analysis

## Code Linters

Analysing code for stylistic errors, and sometimes bugs.

### Python

* [pylint](https://www.pylint.org/) - check coding style and identify some logical errors
* [flake8](https://flake8.pycqa.org/en/latest/) - check code style
* [Bandit](https://bandit.readthedocs.io/en/latest/) - check for common security issues
* [mypy](https://mypy.readthedocs.io/en/stable/) - check static types
* [Radon](https://radon.readthedocs.io/en/latest/) - check code complexity

### R

* [lintr](https://github.com/jimhester/lintr)
* [lintr](https://github.com/jimhester/lintr) - check code style


## Code Formatters

Automated code format repair.
Automated code formatters.
These check code style, like linters, but also actively make changes to your code to meet a particular style.


### Python
Expand All @@ -76,48 +110,24 @@ Automated code format repair.
* [styler](https://styler.r-lib.org/)


## Code Complexity


### Python

* [wily](https://pypi.org/project/wily/)
* [radon](https://radon.readthedocs.io/en/latest/)


## Packaging Code

Creating and releasing code as a package.

### Python

* [twine](https://pypi.org/project/twine/)

* [twine](https://pypi.org/project/twine/) - utility for publishing Python packages to [the Python Package Index PyPI](https://pypi.org/)
rowanhemsi marked this conversation as resolved.
Show resolved Hide resolved

### R

* [goodpractice](http://mangothecat.github.io/goodpractice/)
* [fusen](https://thinkr-open.github.io/fusen/)


## Static Type Checking


### Python

* [mypy](https://mypy.readthedocs.io/en/stable/)
* [pyright](https://github.com/microsoft/pyright)
* [goodpractice](http://mangothecat.github.io/goodpractice/) - gives advice on the quality of your R packages
* [fusen](https://thinkr-open.github.io/fusen/) - builds R packages from Rmarkdown file specifications


## Pipeline Orchestration

* [Apache Airflow](https://airflow.apache.org/)
* [targets R package](https://wlandau.github.io/targets-manual/)
* [Hamilton (Python)](https://hamilton-docs.gitbook.io/docs/)


## Dependency management

* [renv (R)](https://rstudio.github.io/renv/articles/renv.html)
* [Apache Airflow](https://airflow.apache.org/) - workflow management platform
* [targets (R)](https://wlandau.github.io/targets-manual/) - defining and executing pipelines in R


(CI-tools)=
Expand All @@ -128,28 +138,3 @@ Automated code format repair.
* [Travis](https://travis-ci.org/)
rowanhemsi marked this conversation as resolved.
Show resolved Hide resolved
* [Jenkins](https://www.jenkins.io/)
* [Coveralls](https://coveralls.io/)


## Git hook automation

* [pre-commit](https://pre-commit.com/)


## Data processing, analysis, and publishing


### Data linkage

* [Splink (Python)](https://www.gov.uk/government/publications/joined-up-data-in-government-the-future-of-data-linking-methods/splink-mojs-open-source-library-for-probabilistic-record-linkage-at-scale#introduction)


### Data engineering

* [Polars (Python)](https://www.pola.rs/)


### Publishing

* [Quarto](https://quarto.org/)
* [a11tables (R only)](https://co-analysis.github.io/a11ytables/index.html)
* [gptables (Python)](https://gptables.readthedocs.io/en/latest/index.html#)