Skip to content

Commit

Permalink
Merge pull request #24 from phac-nml/add_sample_name
Browse files Browse the repository at this point in the history
Enhanced pipeline logic to support user-defined `sample_name` input
  • Loading branch information
kylacochrane authored Sep 25, 2024
2 parents 2b8da30 + f6e6b0f commit 513b58b
Show file tree
Hide file tree
Showing 18 changed files with 505 additions and 115 deletions.
19 changes: 9 additions & 10 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,12 @@ jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4
- uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4

- name: Set up Python 3.11
uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5
- name: Set up Python 3.12
uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5
with:
python-version: 3.11
cache: "pip"
python-version: "3.12"

- name: Install pre-commit
run: pip install pre-commit
Expand All @@ -32,14 +31,14 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Check out pipeline code
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4
uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4

- name: Install Nextflow
uses: nf-core/setup-nextflow@v1
uses: nf-core/setup-nextflow@v2

- uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5
- uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5
with:
python-version: "3.11"
python-version: "3.12"
architecture: "x64"

- name: Install dependencies
Expand All @@ -60,7 +59,7 @@ jobs:

- name: Upload linting log file artifact
if: ${{ always() }}
uses: actions/upload-artifact@5d5d22a31266ced268874388b861e4b58bb5c2f3 # v4
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808 # v4
with:
name: linting-logs
path: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/linting_comment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Download lint results
uses: dawidd6/action-download-artifact@f6b0bace624032e30a85a8fd9c1a7f8f611f5737 # v3
uses: dawidd6/action-download-artifact@09f2f74827fd3a8607589e5ad7f9398816f540fe # v3
with:
workflow: linting.yml
workflow_conclusion: completed
Expand Down
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,15 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Development

### `Changed`

- Added the ability to include a `sample_name` column in the input samplesheet.csv. Allows for compatibility with IRIDA-Next input configuration [PR24](https://github.com/phac-nml/speciesabundance/pull/24)
- `sample_name` special characters will be replaced with `"_"`
- If no `sample_name` is supplied in the column sample will be used
- To avoid repeat values for `sample_name` all `sample_name` values will be suffixed with the unique `sample` value from the input file

## 2.1.1 - 2024/05/02

### `Changed`
Expand Down Expand Up @@ -36,3 +45,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### `Dependencies`

### `Deprecated`

[2.0.0]: https://github.com/phac-nml/speciesabundance/releases/tag/2.0.0
[2.1.0]: https://github.com/phac-nml/speciesabundance/releases/tag/2.1.0
[2.1.1]: https://github.com/phac-nml/speciesabundance/releases/tag/2.1.1
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,26 @@ The input to the pipeline is a standard sample sheet (passed as `--input samples
| ------- | --------------- | --------------- |
| SampleA | file_1.fastq.gz | file_2.fastq.gz |

An [example samplesheet](../assets/samplesheet_minimal.csv) has been provided with the pipeline.

The structure of this file is defined in [assets/schema_input.json](assets/schema_input.json). Validation of the sample sheet is performed by [nf-validation](https://nextflow-io.github.io/nf-validation/).

## IRIDA-Next Optional Input Configuration

`speciesabundance` accepts the [IRIDA-Next](https://github.com/phac-nml/irida-next) format for samplesheets which can contain an additional column: `sample_name`

`sample_name`: An **optional** column, that overrides `sample` for outputs (filenames and sample names) and reference assembly identification.

`sample_name`, allows more flexibility in naming output files or sample identification. Unlike `sample`, `sample_name` is not required to contain unique values. `Nextflow` requires unique sample names, and therefore in the instance of repeat `sample_names`, `sample` will be suffixed to any `sample_name`. Non-alphanumeric characters (excluding `_`,`-`,`.`) will be replaced with `"_"`.

The sample sheet, when including the optional `sample_name` column, should look like:

| sample | sample_name | fastq_1 | fastq_2 |
| ------- | ----------- | --------------- | --------------- |
| SampleA | A1 | file_1.fastq.gz | file_2.fastq.gz |

An [example samplesheet](../tests/data/samplename_samplesheet.csv) has been provided with the pipeline, which includes the `sample_name` column.

# Parameters

## Mandatory
Expand Down
8 changes: 4 additions & 4 deletions assets/samplesheet.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sample,fastq_1,fastq_2
SAMPLE1,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_R1.fastq.gz,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_R2.fastq.gz
SAMPLE2,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_sample2_R1.fastq.gz,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_sample2_R2.fastq.gz
SAMPLE3,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_R1.fastq.gz,
sample,sample_name,fastq_1,fastq_2
SAMPLE1,A1,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_R1.fastq.gz,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_R2.fastq.gz
SAMPLE2,B2,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_sample2_R1.fastq.gz,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_sample2_R2.fastq.gz
SAMPLE3,C3,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_R1.fastq.gz,
4 changes: 4 additions & 0 deletions assets/samplesheet_minimal.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
sample,fastq_1,fastq_2
SAMPLE1,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_R1.fastq.gz,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_R2.fastq.gz
SAMPLE2,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_sample2_R1.fastq.gz,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_sample2_R2.fastq.gz
SAMPLE3,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_R1.fastq.gz,
7 changes: 6 additions & 1 deletion assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,15 @@
"sample": {
"type": "string",
"pattern": "^\\S+$",
"meta": ["id"],
"meta": ["irida_id"],
"unique": true,
"errorMessage": "Sample name must be provided and cannot contain spaces"
},
"sample_name": {
"type": "string",
"meta": ["id"],
"errorMessage": "Optional. Used to override sample when used in tools like IRIDA-Next."
},
"fastq_1": {
"type": "string",
"pattern": "^\\S+\\.f(ast)?q(\\.gz)?$",
Expand Down
1 change: 1 addition & 0 deletions conf/iridanext.config
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ iridanext {
path = "${params.outdir}/iridanext.output.json.gz"
overwrite = true
files {
idkey = "irida_id"
global = [
"**/failure/failures_report.csv"
]
Expand Down
2 changes: 1 addition & 1 deletion conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,6 @@ params {
max_time = '1.h'

// Input data
input = 'https://raw.githubusercontent.com/phac-nml/speciesabundance/dev/assets/samplesheet.csv'
input = "${projectDir}/assets/samplesheet.csv"
database = "${projectDir}/tests/data/minidb"
}
34 changes: 25 additions & 9 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,40 @@ You will need to create a samplesheet with information about the samples you wou
### Full samplesheet

The input samplesheet must contain three columns: `sample`, `fastq_1`, `fastq_2`. The sampleIDs within a samplesheet should be unqiue. All other columns will be ignored.
This pipleine does not support the processing of long-read sequencing data (Nanopore or PacBio).

A final samplesheet file consisting of both single- and paired-end Illumina short read data may look something like the one below.
This pipleine does not support the processing of long-read sequencing data (Nanopore or PacBio).

```csv title="samplesheet.csv"
```csv title="samplesheet_minimal.csv"
sample,fastq_1,fastq_2
SAMPLE1,sample1_R1.fastq.gz,sample1_R2.fastq.gz
SAMPLE2,sample2_R1.fastq.gz,sample2_R2.fastq.gz
SAMPLE3,sample1_R1.fastq.gz,
SAMPLE3,sample3_R1.fastq.gz,
```

A [example samplesheet](../assets/samplesheet_minimal.csv) has been provided with the pipeline.

### IRIDA-Next Optional Samplesheet Configuration

`speciesabundance` accepts the [IRIDA-Next](https://github.com/phac-nml/irida-next) format for samplesheets which contain the following columns: `sample`, `sample_name`, `fastq_1`, and `fastq_2`. The sample IDs within a samplesheet should be unique.

A final samplesheet file consisting of both single- and paired-end data may look something like the one below.

```csv title'"samplesheet.csv"
sample,sample_name,fastq_1,fastq_2
SAMPLE1,A1,sample1_R1.fastq.gz,sample1_R2.fastq.gz
SAMPLE2,B2,sample2_R1.fastq.gz,sample2_R2.fastq.gz
SAMPLE3,C3,sample3_R1.fastq.gz,
```

| Column | Description |
| --------- | -------------------------------------------------------------------------------------------------------------------------- |
| `sample` | Custom sample name. Samples should be unique within a samplesheet. |
| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| Column | Description |
| ------------- | -------------------------------------------------------------------------------------------------------------------------- |
| `sample` | Custom sample name. Samples should be unique within a samplesheet. |
| `sample_name` | Sample name used in outputs (filenames and sample names) |
| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |

An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline.
An [example samplesheet](../tests/data/samplename_samplesheet.csv) has been provided with the pipeline, which includes the `sample_name` column.

## Running the pipeline

Expand Down
2 changes: 1 addition & 1 deletion modules/local/topN/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ process TOP_N {
${abundances} \\
${args} \\
-n ${top_n} \\
-s ${meta.id} \\
-s ${meta.irida_id} \\
> ${meta.id}_${taxonomic_level}_top_${top_n}.csv
cat <<-END_VERSIONS > versions.yml
Expand Down
10 changes: 5 additions & 5 deletions tests/data/error_samplesheet.csv
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
sample,fastq_1,fastq_2
SAMPLE1,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_R1.fastq.gz,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_sample2_R2.fastq.gz
SAMPLE2,https://raw.githubusercontent.com/phac-nml/speciesabundance/dev/tests/data/fastq/test-kraken_R1_001.fastq.gz,https://raw.githubusercontent.com/phac-nml/speciesabundance/dev/tests/data/fastq/test-kraken_R2_001.fastq.gz
SAMPLE3,https://raw.githubusercontent.com/phac-nml/speciesabundance/dev/tests/data/fastq/test-bracken_R1_001.fastq.gz,https://raw.githubusercontent.com/phac-nml/speciesabundance/dev/tests/data/fastq/test-bracken_R2_001.fastq.gz
SAMPLE4,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_R1.fastq.gz,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_R2.fastq.gz
sample,sample_name,fastq_1,fastq_2
SAMPLE1,A1,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_R1.fastq.gz,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_sample2_R2.fastq.gz
SAMPLE2,B2,https://raw.githubusercontent.com/phac-nml/speciesabundance/dev/tests/data/fastq/test-kraken_R1_001.fastq.gz,https://raw.githubusercontent.com/phac-nml/speciesabundance/dev/tests/data/fastq/test-kraken_R2_001.fastq.gz
SAMPLE3,C3,https://raw.githubusercontent.com/phac-nml/speciesabundance/dev/tests/data/fastq/test-bracken_R1_001.fastq.gz,https://raw.githubusercontent.com/phac-nml/speciesabundance/dev/tests/data/fastq/test-bracken_R2_001.fastq.gz
SAMPLE4,D4,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_R1.fastq.gz,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_R2.fastq.gz
4 changes: 2 additions & 2 deletions tests/data/fail_samplesheet.csv
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
sample,fastq_1,fastq_2
SAMPLE1,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_R1.fastq.gz,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_sample2_R2.fastq.gz
sample,sample_name,fastq_1,fastq_2
SAMPLE1,A1,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_R1.fastq.gz,https://github.com/nf-core/test-datasets/raw/mag/test_data/test_minigut_sample2_R2.fastq.gz
Loading

0 comments on commit 513b58b

Please sign in to comment.