diff --git a/.editorconfig b/.editorconfig new file mode 100644 index 00000000..b6b31907 --- /dev/null +++ b/.editorconfig @@ -0,0 +1,24 @@ +root = true + +[*] +charset = utf-8 +end_of_line = lf +insert_final_newline = true +trim_trailing_whitespace = true +indent_size = 4 +indent_style = space + +[*.{md,yml,yaml,html,css,scss,js}] +indent_size = 2 + +# These files are edited and tested upstream in nf-core/modules +[/modules/nf-core/**] +charset = unset +end_of_line = unset +insert_final_newline = unset +trim_trailing_whitespace = unset +indent_style = unset +indent_size = unset + +[/assets/email*] +indent_size = unset diff --git a/.gitattributes b/.gitattributes index 7fe55006..050bb120 100644 --- a/.gitattributes +++ b/.gitattributes @@ -1 +1,3 @@ *.config linguist-language=nextflow +modules/nf-core/** linguist-generated +subworkflows/nf-core/** linguist-generated diff --git a/.github/.dockstore.yml b/.github/.dockstore.yml new file mode 100644 index 00000000..191fabd2 --- /dev/null +++ b/.github/.dockstore.yml @@ -0,0 +1,6 @@ +# Dockstore config version, not pipeline version +version: 1.2 +workflows: + - subclass: nfl + primaryDescriptorPath: /nextflow.config + publish: True diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 260f8fd6..9ae9e986 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -15,11 +15,11 @@ Contributions to the code are even more welcome ;) If you'd like to write some code for nf-core/rnafusion, the standard workflow is as follows: -1. Check that there isn't already an issue about your idea in the [nf-core/rnafusion issues](https://github.com/nf-core/rnafusion/issues) to avoid duplicating work - * If there isn't one already, please create one so that others know you're working on this +1. Check that there isn't already an issue about your idea in the [nf-core/rnafusion issues](https://github.com/nf-core/rnafusion/issues) to avoid duplicating work. If there isn't one already, please create one so that others know you're working on this 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/rnafusion repository](https://github.com/nf-core/rnafusion) to your GitHub account -3. Make the necessary changes / additions within your forked repository -4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged +3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions) +4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). +5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). @@ -30,14 +30,14 @@ Typically, pull-requests are only fully reviewed when these tests are passing, t There are typically two types of tests that run: -### Lint Tests +### Lint tests `nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. -### Pipeline Tests +### Pipeline tests Each `nf-core` pipeline should be set up with a minimal set of test-data. `GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully. @@ -46,12 +46,58 @@ These tests are run both with the latest available version of `Nextflow` and als ## Patch -: warning: Only in the unlikely and regretful event of a release happening with a bug. +:warning: Only in the unlikely and regretful event of a release happening with a bug. -* On your own fork, make a new branch `patch` based on `upstream/master`. -* Fix the bug, and bump version (X.Y.Z+1). -* A PR should be made on `master` from patch to directly this particular bug. +- On your own fork, make a new branch `patch` based on `upstream/master`. +- Fix the bug, and bump version (X.Y.Z+1). +- A PR should be made on `master` from patch to directly this particular bug. ## Getting help -For further information/help, please consult the [nf-core/rnafusion documentation](https://nf-co.re/nf-core/rnafusion/docs) and don't hesitate to get in touch on the nf-core Slack [#rnafusion](https://nfcore.slack.com/channels/rnafusion) channel ([join our Slack here](https://nf-co.re/join/slack)). +For further information/help, please consult the [nf-core/rnafusion documentation](https://nf-co.re/rnafusion/usage) and don't hesitate to get in touch on the nf-core Slack [#rnafusion](https://nfcore.slack.com/channels/rnafusion) channel ([join our Slack here](https://nf-co.re/join/slack)). + +## Pipeline contribution conventions + +To make the nf-core/rnafusion code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written. + +### Adding a new step + +If you wish to contribute a new step, please use the following coding standards: + +1. Define the corresponding input channel into your new process from the expected previous process channel +2. Write the process block (see below). +3. Define the output channel if needed (see below). +4. Add any new parameters to `nextflow.config` with a default (see below). +5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core schema build` tool). +6. Add sanity checks and validation for all relevant parameters. +7. Perform local tests to validate that the new code works as expected. +8. If applicable, add a new test command in `.github/workflow/ci.yml`. +9. Update MultiQC config `assets/multiqc_config.yml` so relevant suffixes, file name clean up and module plots are in the appropriate order. If applicable, add a [MultiQC](https://https://multiqc.info/) module. +10. Add a description of the output files and if relevant any appropriate images from the MultiQC report to `docs/output.md`. + +### Default values + +Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope. + +Once there, use `nf-core schema build` to add to `nextflow_schema.json`. + +### Default processes resource requirements + +Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. + +The process resources can be passed on to the tool dynamically within the process with the `${task.cpu}` and `${task.memory}` variables in the `script:` block. + +### Naming schemes + +Please use the following naming schemes, to make it easy to understand what is going where. + +- initial process channel: `ch_output_from_` +- intermediate and terminal channels: `ch__for_` + +### Nextflow version bumping + +If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]` + +### Images and figures + +For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines). diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md deleted file mode 100644 index 45a4d208..00000000 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ /dev/null @@ -1,42 +0,0 @@ -# nf-core/rnafusion bug report - -Hi there! - -Thanks for telling us about a problem with the pipeline. -Please delete this text and anything that's not relevant from the template below: - -## Describe the bug - -A clear and concise description of what the bug is. - -## Steps to reproduce - -Steps to reproduce the behaviour: - -1. Command line: `nextflow run ...` -2. See error: _Please provide your error message_ - -## Expected behaviour - -A clear and concise description of what you expected to happen. - -## System - -- Hardware: -- Executor: -- OS: -- Version - -## Nextflow Installation - -- Version: - -## Container engine - -- Engine: -- version: -- Image tag: - -## Additional context - -Add any other context about the problem here. diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml new file mode 100644 index 00000000..a172d421 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug_report.yml @@ -0,0 +1,50 @@ +name: Bug report +description: Report something that is broken or incorrect +labels: bug +body: + - type: markdown + attributes: + value: | + Before you post this issue, please check the documentation: + + - [nf-core website: troubleshooting](https://nf-co.re/usage/troubleshooting) + - [nf-core/rnafusion pipeline documentation](https://nf-co.re/rnafusion/usage) + + - type: textarea + id: description + attributes: + label: Description of the bug + description: A clear and concise description of what the bug is. + validations: + required: true + + - type: textarea + id: command_used + attributes: + label: Command used and terminal output + description: Steps to reproduce the behaviour. Please paste the command you used to launch the pipeline and the output from your terminal. + render: console + placeholder: | + $ nextflow run ... + + Some output where something broke + + - type: textarea + id: files + attributes: + label: Relevant files + description: | + Please drag and drop the relevant files here. Create a `.zip` archive if the extension is not allowed. + Your verbose log file `.nextflow.log` is often useful _(this is a hidden file in the directory where you launched the pipeline)_ as well as custom Nextflow configuration files. + + - type: textarea + id: system + attributes: + label: System information + description: | + * Nextflow version _(eg. 21.10.3)_ + * Hardware _(eg. HPC, Desktop, Cloud)_ + * Executor _(eg. slurm, local, awsbatch)_ + * Container engine: _(e.g. Docker, Singularity, Conda, Podman, Shifter or Charliecloud)_ + * OS _(eg. CentOS Linux, macOS, Linux Mint)_ + * Version of nf-core/rnafusion _(eg. 1.1, 1.5, 1.8.2)_ diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml new file mode 100644 index 00000000..69a065e7 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/config.yml @@ -0,0 +1,7 @@ +contact_links: + - name: Join nf-core + url: https://nf-co.re/join + about: Please join the nf-core community here + - name: "Slack #rnafusion channel" + url: https://nfcore.slack.com/channels/rnafusion + about: Discussion about the nf-core/rnafusion pipeline diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md deleted file mode 100644 index 167d7f93..00000000 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ /dev/null @@ -1,24 +0,0 @@ -# nf-core/rnafusion feature request - -Hi there! - -Thanks for suggesting a new feature for the pipeline! -Please delete this text and anything that's not relevant from the template below: - -## Is your feature request related to a problem? Please describe - -A clear and concise description of what the problem is. - -Ex. I'm always frustrated when [...] - -## Describe the solution you'd like - -A clear and concise description of what you want to happen. - -## Describe alternatives you've considered - -A clear and concise description of any alternative solutions or features you've considered. - -## Additional context - -Add any other context about the feature request here. diff --git a/.github/ISSUE_TEMPLATE/feature_request.yml b/.github/ISSUE_TEMPLATE/feature_request.yml new file mode 100644 index 00000000..2c388d2f --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature_request.yml @@ -0,0 +1,11 @@ +name: Feature request +description: Suggest an idea for the nf-core/rnafusion pipeline +labels: enhancement +body: + - type: textarea + id: description + attributes: + label: Description of feature + description: Please describe your suggestion for a new feature. It might help to describe a problem or use case, plus any alternatives that you have considered. + validations: + required: true diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index e504e427..bc52a0a2 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,3 +1,4 @@ + + ## PR checklist -- [ ] This comment contains a description of changes (with reason) +- [ ] This comment contains a description of changes (with reason). - [ ] If you've fixed a bug or added code that should be tested, add tests! -- [ ] If necessary, also make a PR on the [nf-core/rnafusion branch on the nf-core/test-datasets repo](https://github.com/nf-core/test-datasets/pull/new/nf-core/rnafusion) -- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`). -- [ ] Make sure your code lints (`nf-core lint .`). -- [ ] Documentation in `docs` is updated -- [ ] `CHANGELOG.md` is updated -- [ ] `README.md` is updated - -**Learn more about contributing:** [CONTRIBUTING.md](https://github.com/nf-core/rnafusion/tree/master/.github/CONTRIBUTING.md) \ No newline at end of file + - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/rnafusion/tree/master/.github/CONTRIBUTING.md) + - [ ] If necessary, also make a PR on the nf-core/rnafusion _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. +- [ ] Make sure your code lints (`nf-core lint`). +- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker --outdir `). +- [ ] Usage Documentation in `docs/usage.md` is updated. +- [ ] Output Documentation in `docs/output.md` is updated. +- [ ] `CHANGELOG.md` is updated. +- [ ] `README.md` is updated (including new tool citations and authors/contributors). diff --git a/.github/markdownlint.yml b/.github/markdownlint.yml deleted file mode 100644 index 96b12a70..00000000 --- a/.github/markdownlint.yml +++ /dev/null @@ -1,5 +0,0 @@ -# Markdownlint configuration file -default: true, -line-length: false -no-duplicate-header: - siblings_only: true diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml new file mode 100644 index 00000000..780ae9f8 --- /dev/null +++ b/.github/workflows/awsfulltest.yml @@ -0,0 +1,42 @@ +name: nf-core AWS full size tests +# This workflow is triggered on published releases. +# It can be additionally triggered manually with GitHub actions workflow dispatch button. +# It runs the -profile 'test_full' on AWS batch + +on: + release: + types: [published] + workflow_dispatch: +jobs: + run-tower: + name: Run AWS full tests + if: github.repository == 'nf-core/rnafusion' + runs-on: ubuntu-latest + steps: + - name: Launch build workflow via tower + uses: nf-core/tower-action@v3 + with: + workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }} + access_token: ${{ secrets.TOWER_ACCESS_TOKEN }} + compute_env: ${{ secrets.TOWER_COMPUTE_ENV }} + workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/rnafusion/work-${{ github.sha }} + parameters: | + { + "outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/rnafusion/results-${{ github.sha }}" + } + profiles: test_full_build,aws_tower + nextflow_config: | + process.errorStrategy = 'retry' + process.maxRetries = 3 + - name: Launch workflow via tower + uses: nf-core/tower-action@v3 + with: + workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }} + access_token: ${{ secrets.TOWER_ACCESS_TOKEN }} + compute_env: ${{ secrets.TOWER_COMPUTE_ENV }} + workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/rnafusion/work-${{ github.sha }} + parameters: | + { + "outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/rnafusion/results-${{ github.sha }}" + } + profiles: test_full,aws_tower diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml new file mode 100644 index 00000000..5a04305d --- /dev/null +++ b/.github/workflows/awstest.yml @@ -0,0 +1,25 @@ +name: nf-core AWS test +# This workflow can be triggered manually with the GitHub actions workflow dispatch button. +# It runs the -profile 'test' on AWS batch + +on: + workflow_dispatch: +jobs: + run-tower: + name: Run AWS tests + if: github.repository == 'nf-core/rnafusion' + runs-on: ubuntu-latest + steps: + # Launch workflow using Tower CLI tool action + - name: Launch workflow via tower + uses: nf-core/tower-action@v3 + with: + workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }} + access_token: ${{ secrets.TOWER_ACCESS_TOKEN }} + compute_env: ${{ secrets.TOWER_COMPUTE_ENV }} + workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/rnafusion/work-${{ github.sha }} + parameters: | + { + "outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/rnafusion/results-test-${{ github.sha }}" + } + profiles: test,aws_tower diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml index ed654eae..ccb7211b 100644 --- a/.github/workflows/branch.yml +++ b/.github/workflows/branch.yml @@ -2,15 +2,43 @@ name: nf-core branch protection # This workflow is triggered on PRs to master branch on the repository # It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev` on: - pull_request: - branches: - - master + pull_request_target: + branches: [master] jobs: test: - runs-on: ubuntu-18.04 + runs-on: ubuntu-latest steps: - # PRs are only ok if coming from an nf-core `dev` branch or a fork `patch` branch + # PRs to the nf-core repo master branch are only ok if coming from the nf-core repo `dev` or any `patch` branches - name: Check PRs + if: github.repository == 'nf-core/rnafusion' run: | - { [[ $(git remote get-url origin) == *nf-core/rnafusion ]] && [[ ${GITHUB_HEAD_REF} = "dev" ]]; } || [[ ${GITHUB_HEAD_REF} == "patch" ]] + { [[ ${{github.event.pull_request.head.repo.full_name }} == nf-core/rnafusion ]] && [[ $GITHUB_HEAD_REF = "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]] + + # If the above check failed, post a comment on the PR explaining the failure + # NOTE - this doesn't currently work if the PR is coming from a fork, due to limitations in GitHub actions secrets + - name: Post PR comment + if: failure() + uses: mshick/add-pr-comment@v1 + with: + message: | + ## This PR is against the `master` branch :x: + + * Do not close this PR + * Click _Edit_ and change the `base` to `dev` + * This CI test will remain failed until you push a new commit + + --- + + Hi @${{ github.event.pull_request.user.login }}, + + It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `master` branch. + The `master` branch on nf-core repositories should always contain code from the latest release. + Because of this, PRs to `master` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch. + + You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page. + Note that even after this, the test will continue to show as failing until you push a new commit. + + Thanks again for your contribution! + repo-token: ${{ secrets.GITHUB_TOKEN }} + allow-repeats: false diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index a33e998d..750af29d 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -1,31 +1,56 @@ name: nf-core CI -# This workflow is triggered on pushes and PRs to the repository. -# It runs the pipeline with the minimal test dataset to check that it completes without any syntax errors -on: [push, pull_request] +# This workflow runs the pipeline with the minimal test dataset to check that it completes without any syntax errors +on: + push: + branches: + - dev + pull_request: + release: + types: [published] + +env: + NXF_ANSI_LOG: false + CAPSULE_LOG: none jobs: test: - env: - NXF_VER: ${{ matrix.nxf_ver }} - NXF_ANSI_LOG: false + name: Run pipeline with test data + # Only run on push if this is the nf-core dev branch (merged PRs) + if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/rnafusion') }}" runs-on: ubuntu-latest strategy: matrix: - # Nextflow versions: check pipeline minimum and current latest - nxf_ver: ['19.10.0', ''] + # Nextflow versions + include: + # Test pipeline minimum Nextflow version + - NXF_VER: "21.10.3" + NXF_EDGE: "" + # Test latest edge release of Nextflow + - NXF_VER: "" + NXF_EDGE: "1" steps: - - uses: actions/checkout@v2 + - name: Check out pipeline code + uses: actions/checkout@v2 + - name: Install Nextflow + env: + NXF_VER: ${{ matrix.NXF_VER }} + # Uncomment only if the edge release is more recent than the latest stable release + # See https://github.com/nextflow-io/nextflow/issues/2467 + # NXF_EDGE: ${{ matrix.NXF_EDGE }} run: | wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ - - name: Pull docker image - run: | - docker pull nfcore/rnafusion:dev - docker tag nfcore/rnafusion:dev nfcore/rnafusion:1.2.0 - - name: Run pipeline + + - name: Test building pizzly references run: | - nextflow run ${GITHUB_WORKSPACE} --help - - name: Run pipeline for downloading references + nextflow run ${GITHUB_WORKSPACE} -profile test,docker \ + --outdir /home/runner/work/rnafusion/rnafusion/results --pizzly --fusionreport \ + --build_references --genomes_base /home/runner/work/rnafusion/rnafusion/results/references \ + --cosmic_username ${{ secrets.COSMIC_USERNAME }} --cosmic_passwd ${{ secrets.COSMIC_PASSWD }} + + - name: Test running pizzly references run: | - nextflow run ${GITHUB_WORKSPACE}/download-references.nf --help + nextflow run ${GITHUB_WORKSPACE} -profile test,docker \ + --outdir /home/runner/work/rnafusion/rnafusion/results --pizzly \ + --genomes_base /home/runner/work/rnafusion/rnafusion/results/references -stub diff --git a/.github/workflows/fix-linting.yml b/.github/workflows/fix-linting.yml new file mode 100644 index 00000000..2ffbb6ea --- /dev/null +++ b/.github/workflows/fix-linting.yml @@ -0,0 +1,55 @@ +name: Fix linting from a comment +on: + issue_comment: + types: [created] + +jobs: + deploy: + # Only run if comment is on a PR with the main repo, and if it contains the magic keywords + if: > + contains(github.event.comment.html_url, '/pull/') && + contains(github.event.comment.body, '@nf-core-bot fix linting') && + github.repository == 'nf-core/rnafusion' + runs-on: ubuntu-latest + steps: + # Use the @nf-core-bot token to check out so we can push later + - uses: actions/checkout@v3 + with: + token: ${{ secrets.nf_core_bot_auth_token }} + + # Action runs on the issue comment, so we don't get the PR by default + # Use the gh cli to check out the PR + - name: Checkout Pull Request + run: gh pr checkout ${{ github.event.issue.number }} + env: + GITHUB_TOKEN: ${{ secrets.nf_core_bot_auth_token }} + + - uses: actions/setup-node@v2 + + - name: Install Prettier + run: npm install -g prettier @prettier/plugin-php + + # Check that we actually need to fix something + - name: Run 'prettier --check' + id: prettier_status + run: | + if prettier --check ${GITHUB_WORKSPACE}; then + echo "::set-output name=result::pass" + else + echo "::set-output name=result::fail" + fi + + - name: Run 'prettier --write' + if: steps.prettier_status.outputs.result == 'fail' + run: prettier --write ${GITHUB_WORKSPACE} + + - name: Commit & push changes + if: steps.prettier_status.outputs.result == 'fail' + run: | + git config user.email "core@nf-co.re" + git config user.name "nf-core-bot" + git config push.default upstream + git add . + git status + git commit -m "[automated] Fix linting with Prettier" + git push diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 1e0827a8..77358dee 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -1,6 +1,7 @@ name: nf-core linting # This workflow is triggered on pushes and PRs to the repository. -# It runs the `nf-core lint` and markdown lint tests to ensure that the code meets the nf-core guidelines +# It runs the `nf-core lint` and markdown lint tests to ensure +# that the code meets the nf-core guidelines. on: push: pull_request: @@ -8,43 +9,72 @@ on: types: [published] jobs: - Markdown: + EditorConfig: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - - uses: actions/setup-node@v1 - with: - node-version: '10' - - name: Install markdownlint - run: npm install -g markdownlint-cli - - name: Run Markdownlint - run: markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.github/markdownlint.yml - YAML: + + - uses: actions/setup-node@v2 + + - name: Install editorconfig-checker + run: npm install -g editorconfig-checker + + - name: Run ECLint check + run: editorconfig-checker -exclude README.md $(find .* -type f | grep -v '.git\|.py\|.md\|json\|yml\|yaml\|html\|css\|work\|.nextflow\|build\|nf_core.egg-info\|log.txt\|Makefile') + + Prettier: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v1 - - uses: actions/setup-node@v1 - with: - node-version: '10' - - name: Install yaml-lint - run: npm install -g yaml-lint - - name: Run yaml-lint - run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml") + - uses: actions/checkout@v2 + + - uses: actions/setup-node@v2 + + - name: Install Prettier + run: npm install -g prettier + + - name: Run Prettier --check + run: prettier --check ${GITHUB_WORKSPACE} + nf-core: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v2 + - name: Check out pipeline code + uses: actions/checkout@v2 + - name: Install Nextflow + env: + CAPSULE_LOG: none run: | wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ - - uses: actions/setup-python@v1 + + - uses: actions/setup-python@v3 with: - python-version: '3.6' - architecture: 'x64' + python-version: "3.6" + architecture: "x64" + - name: Install dependencies run: | python -m pip install --upgrade pip pip install nf-core + - name: Run nf-core lint - run: nf-core lint ${GITHUB_WORKSPACE} + env: + GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} + run: nf-core -l lint_log.txt lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md + + - name: Save PR number + if: ${{ always() }} + run: echo ${{ github.event.pull_request.number }} > PR_number.txt + + - name: Upload linting log file artifact + if: ${{ always() }} + uses: actions/upload-artifact@v2 + with: + name: linting-logs + path: | + lint_log.txt + lint_results.md + PR_number.txt diff --git a/.github/workflows/linting_comment.yml b/.github/workflows/linting_comment.yml new file mode 100644 index 00000000..04758f61 --- /dev/null +++ b/.github/workflows/linting_comment.yml @@ -0,0 +1,28 @@ +name: nf-core linting comment +# This workflow is triggered after the linting action is complete +# It posts an automated comment to the PR, even if the PR is coming from a fork + +on: + workflow_run: + workflows: ["nf-core linting"] + +jobs: + test: + runs-on: ubuntu-latest + steps: + - name: Download lint results + uses: dawidd6/action-download-artifact@v2 + with: + workflow: linting.yml + workflow_conclusion: completed + + - name: Get PR number + id: pr_number + run: echo "::set-output name=pr_number::$(cat linting-logs/PR_number.txt)" + + - name: Post PR comment + uses: marocchino/sticky-pull-request-comment@v2 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + number: ${{ steps.pr_number.outputs.pr_number }} + path: linting-logs/lint_results.md diff --git a/.gitignore b/.gitignore index b8c9a300..5124c9ac 100644 --- a/.gitignore +++ b/.gitignore @@ -1,7 +1,8 @@ .nextflow* work/ +data/ results/ .DS_Store -test/* +testing/ +testing* *.pyc -.vscode/ \ No newline at end of file diff --git a/.gitpod.yml b/.gitpod.yml new file mode 100644 index 00000000..85d95ecc --- /dev/null +++ b/.gitpod.yml @@ -0,0 +1,14 @@ +image: nfcore/gitpod:latest + +vscode: + extensions: # based on nf-core.nf-core-extensionpack + - codezombiech.gitignore # Language support for .gitignore files + # - cssho.vscode-svgviewer # SVG viewer + - esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code + - eamodio.gitlens # Quickly glimpse into whom, why, and when a line or code block was changed + - EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files + - Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar + - mechatroner.rainbow-csv # Highlight columns in csv files in different colors + # - nextflow.nextflow # Nextflow syntax highlighting + - oderwat.indent-rainbow # Highlight indentation level + - streetsidesoftware.code-spell-checker # Spelling checker for source code diff --git a/.nf-core.yml b/.nf-core.yml new file mode 100644 index 00000000..3805dc81 --- /dev/null +++ b/.nf-core.yml @@ -0,0 +1 @@ +repository_type: pipeline diff --git a/.prettierignore b/.prettierignore new file mode 100644 index 00000000..d0e7ae58 --- /dev/null +++ b/.prettierignore @@ -0,0 +1,9 @@ +email_template.html +.nextflow* +work/ +data/ +results/ +.DS_Store +testing/ +testing* +*.pyc diff --git a/.prettierrc.yml b/.prettierrc.yml new file mode 100644 index 00000000..c81f9a76 --- /dev/null +++ b/.prettierrc.yml @@ -0,0 +1 @@ +printWidth: 120 diff --git a/CHANGELOG.md b/CHANGELOG.md index 5a990f62..8399e7a8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,81 +1,226 @@ -# nfcore/rnafusion: Changelog +# nf-core/rnafusion: Changelog -All notable changes to this project will be documented in this file. +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) -and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). +## [2.0.0] nfcore/rnafusion - 2022/04/14 -## [1.2.0] nfcore/rnafusion - 2020/07/15 +Update to DSL2 and newer software/reference versions ### Added -* Added social preview image [#107](https://github.com/nf-core/rnafusion/issues/107) +- Added `qualimap/rnaseq v2.2.2d` from nf-core modules +- Added UCSC `gtfToGenePred v377` +- Added `picard CollectRnaSeqMetrics v2.26.10` +- Added `picard MarkDuplicates v2.26.10` from nf-core modules +- Added `cat/fastqc` from nf-core modules +- Added possibility for manually feeding the results of fusions from different tools to speed-up reruns +- STAR-Fusion references can be downloaded or built but downloaded references are NOT RECOMMENDED as not thoroughly tested (--starfusion_build parameter is true by default, use --starfusion_build false to use downloaded STAR-Fusion references). ### Changed -* Upgrade `fusion-report v2.1.2` to `fusion-report v2.1.3` -* Upgrade `fusion-report v2.1.1` to `fusion-report v2.1.2` -* Upgrade `fusion-report v2.1.0` to `fusion-report v2.1.1` -* Upgrade `Arriba v1.1.0` to `Arriba v1.2.0` -* Upgrade `fusion-report v2.0.2` to `fusion-report v2.1.0` +- Upgrade default ensembl version to `102` +- Upgrade to `nf-core/tools v2.3.2` +- Upgrade `Arriba v1.2.0` to `Arriba v2.2.1` +- Upgrade `FusionCatcher v1.20` to `FusionCatcher v1.33` +- Upgrade `STAR-fusion v1.8.1` to `STAR-fusion v1.10.1` +- Upgrade `STAR v2.7.1` to `STAR v2.7.9` +- Upgrade `fusion-report v2.1.3` to `fusion-report v2.1.5` +- Upgrade `kallisto v0.44.0` to `kallisto v0.46.2` +- Upgrade `fastqc v0.11.8` to `fastqc v0.11.9` +- Upgrade `samtools v1.9` to `samtools v1.15.1` +- Upgrade `arriba` references from `v1.2.0` to `v2.1.0` +- Upgrade `fusioncatcher` references from `v98` to `v102` +- Use `arriba` (detect only), `kallisto` and `STAR` from nf-core modules +- Instead of separate script to build the references, added `--build_references` argument in the main +- `--fasta` argument is not required with `--build_references` and set by default to the ensembl references built in the detection workflow +- CI test done on stubs of reference building for subprocesses ensembl and arriba + +Parameters for `STAR` for `arriba` changed from: + +```bash +--readFilesCommand zcat \\ + --outSAMtype BAM Unsorted \\ +--outStd BAM_Unsorted \\ +--outSAMunmapped Within \\ +--outBAMcompression 0 \\ +--outFilterMultimapNmax 1 \\ +--outFilterMismatchNmax 3 \\ +--chimSegmentMin 10 \\ +--chimOutType WithinBAM SoftClip \\ +--chimJunctionOverhangMin 10 \\ +--chimScoreMin 1 \\ +--chimScoreDropMax 30 \\ +--chimScoreJunctionNonGTAG 0 \\ +--chimScoreSeparation 1 \\ +--alignSJstitchMismatchNmax 5 -1 5 5 \\ +--chimSegmentReadGapMax 3 \\ +--sjdbOverhang ${params.read_length - 1} +``` + +to + +```bash +--readFilesCommand zcat \ +--outSAMtype BAM Unsorted \ +--outSAMunmapped Within \ +--outBAMcompression 0 \ +--outFilterMultimapNmax 50 \ +--peOverlapNbasesMin 10 \ +--alignSplicedMateMapLminOverLmate 0.5 \ +--alignSJstitchMismatchNmax 5 -1 5 5 \ +--chimSegmentMin 10 \ +--chimOutType WithinBAM HardClip \ +--chimJunctionOverhangMin 10 \ +--chimScoreDropMax 30 \ +--chimScoreJunctionNonGTAG 0 \ +--chimScoreSeparation 1 \ +--chimSegmentReadGapMax 3 \ +--chimMultimapNmax 50 +``` + +As recommended [here](https://arriba.readthedocs.io/en/latest/workflow/). + +Parameters for `STAR` for `STAR-fusion` changed from: + +```bash +--twopassMode Basic \\ +--outReadsUnmapped None \\ +--chimSegmentMin 12 \\ +--chimJunctionOverhangMin 12 \\ +--alignSJDBoverhangMin 10 \\ +--alignMatesGapMax 100000 \\ +--alignIntronMax 100000 \\ +--chimSegmentReadGapMax 3 \\ +--alignSJstitchMismatchNmax 5 -1 5 5 \\ +--runThreadN ${task.cpus} \\ +--outSAMstrandField intronMotif ${avail_mem} \\ +--outSAMunmapped Within \\ +--outSAMtype BAM Unsorted \\ +--outSAMattrRGline ID:GRPundef \\ +--chimMultimapScoreRange 10 \\ +--chimMultimapNmax 10 \\ +--chimNonchimScoreDropMin 10 \\ +--peOverlapNbasesMin 12 \\ +--peOverlapMMp 0.1 \\ +--readFilesCommand zcat \\ +--sjdbOverhang ${params.read_length - 1} \\ +--chimOutJunctionFormat 1 +``` + +to + +```bash +--outReadsUnmapped None \ +--readFilesCommand zcat \ +--outSAMtype BAM SortedByCoordinate \ +--outSAMstrandField intronMotif \ +--outSAMunmapped Within \ +--chimSegmentMin 12 \ +--chimJunctionOverhangMin 8 \ +--chimOutJunctionFormat 1 \ +--alignSJDBoverhangMin 10 \ +--alignMatesGapMax 100000 \ +--alignIntronMax 100000 \ +--alignSJstitchMismatchNmax 5 -1 5 5 \ +--chimMultimapScoreRange 3 \ +--chimScoreJunctionNonGTAG -4 \ +--chimMultimapNmax 20 \ +--chimNonchimScoreDropMin 10 \ +--peOverlapNbasesMin 12 \ +--peOverlapMMp 0.1 \ +--alignInsertionFlush Right \ +--alignSplicedMateMapLminOverLmate 0 \ +--alignSplicedMateMapLmin 30 \ +--chimOutType Junctions +``` + +`Homo_sapiens.${params.genome}.${ensembl_version}.gtf.gz` used for squid and arriba, `Homo_sapiens.${params.genome}.${ensembl_version}.chr.gtf.gz` used for STAR-fusion and the quality control as the quality control is based on the STAR-fusion alignment. ### Fixed -* Missing `strip-components` in `download-references.nf/star-fusion` [#148](https://github.com/nf-core/rnafusion/issues/148) -* Missing version prefix for cdna [#143](https://github.com/nf-core/rnafusion/issues/143) -* `samtools` missing header in empty file for FusionInspector [ref](https://github.com/STAR-Fusion/STAR-Fusion/issues/191) -* Removed `profile` from helper scripts [#139](https://github.com/nf-core/rnafusion/issues/139) -* Wrong url path for `Pfam-A.hmm.gz` [#140](https://github.com/nf-core/rnafusion/issues/140) +### Removed + +- Ericscript tool +- GRCh37 support. Subdirectory with params.genome are removed +- Running with conda + +## v1.3.0dev nfcore/rnafusion - 2020/07/15 + +- Using official STAR-Fusion container [#160](https://github.com/nf-core/rnafusion/issues/160) + +### Added + +- Added social preview image [#107](https://github.com/nf-core/rnafusion/issues/107) +- Added support for GRCh37 genome assembly [#77](https://github.com/nf-core/rnafusion/issues/77) + +### Changed + +- Upgrade `fusion-report v2.1.2` to `fusion-report v2.1.3` +- Upgrade `fusion-report v2.1.1` to `fusion-report v2.1.2` +- Upgrade `fusion-report v2.1.0` to `fusion-report v2.1.1` +- Upgrade `Arriba v1.1.0` to `Arriba v1.2.0` +- Upgrade `fusion-report v2.0.2` to `fusion-report v2.1.0` + +### Fixed + +- Missing `strip-components` in `download-references.nf/star-fusion` [#148](https://github.com/nf-core/rnafusion/issues/148) +- Missing version prefix for cdna [#143](https://github.com/nf-core/rnafusion/issues/143) +- `samtools` missing header in empty file for FusionInspector [ref](https://github.com/STAR-Fusion/STAR-Fusion/issues/191) +- Removed `profile` from helper scripts [#139](https://github.com/nf-core/rnafusion/issues/139) +- Wrong url path for `Pfam-A.hmm.gz` [#140](https://github.com/nf-core/rnafusion/issues/140) ### Removed -* Removed `scripts/download-singularity-img.sh` and `download-singularity-img.nf` as they are not necessary any more +- Removed `scripts/download-singularity-img.sh` and `download-singularity-img.nf` as they are not necessary any more + +--- ## [1.1.0] nfcore/rnafusion - 2020/02/10 -* Fusion gene detection tools: - * `Arriba v1.1.0` - * `Ericscript v0.5.5` - * `Fusioncatcher v1.20` - * `Pizzly v0.37.3` - * `Squid v1.5` - * `STAR-Fusion v1.6.0` -* Visualization tools: - * `Arriba v1.1.0` - * `FusionInspector v1.3.1` -* Other tools: - * `fusion-report v2.0.1` - * `FastQ v0.11.8` - * `MultiQC v1.7` - * `STAR aligner v2.7.0f` +- Fusion gene detection tools: + - `Arriba v1.1.0` + - `Ericscript v0.5.5` + - `Fusioncatcher v1.20` + - `Pizzly v0.37.3` + - `Squid v1.5` + - `STAR-Fusion v1.6.0` +- Visualization tools: + - `Arriba v1.1.0` + - `FusionInspector v1.3.1` +- Other tools: + - `fusion-report v2.0.1` + - `FastQ v0.11.8` + - `MultiQC v1.7` + - `STAR aligner v2.7.0f` ### Added -* Added `Arriba 1.1.0` [#63](https://github.com/nf-core/rnafusion/issues/63) -* Added Batch mode [#54](https://github.com/nf-core/rnafusion/issues/54) +- Added `Arriba 1.1.0` [#63](https://github.com/nf-core/rnafusion/issues/63) +- Added Batch mode [#54](https://github.com/nf-core/rnafusion/issues/54) ### Changed -* Updated examples and configurations -* Upgraded `fusion-report v1.0.0` to `fusion-report v2.0.1` -* Divided `running_tools` into fusion and visualization tools -* Updated `STAR` in `Squid`, `Fusion-Inspector` version to `2.7.0f` -* Upgraded `STAR-Fusion v1.5.0` to `STAR-Fusion v1.6.0` [#83](https://github.com/nf-core/rnafusion/issues/83) -* Parameter `igenomesIgnore` renamed to `igenome` [#81](https://github.com/nf-core/rnafusion/issues/81) -* Finished STAR-Fusion file renaming [#18](https://github.com/nf-core/rnafusion/issues/18) -* Updated logos -* Updated to nf-core `1.8` TEMPLATE +- Updated examples and configurations +- Upgraded `fusion-report v1.0.0` to `fusion-report v2.0.1` +- Divided `running_tools` into fusion and visualization tools +- Updated `STAR` in `Squid`, `Fusion-Inspector` version to `2.7.0f` +- Upgraded `STAR-Fusion v1.5.0` to `STAR-Fusion v1.6.0` [#83](https://github.com/nf-core/rnafusion/issues/83) +- Parameter `igenomesIgnore` renamed to `igenome` [#81](https://github.com/nf-core/rnafusion/issues/81) +- Finished STAR-Fusion file renaming [#18](https://github.com/nf-core/rnafusion/issues/18) +- Updated logos +- Updated to nf-core `1.8` TEMPLATE ### Fixed -* iGenomes optional, but not really [#91](https://github.com/nf-core/rnafusion/issues/91) -* Updated `fusioncatcher` to latest `1.20` version also solving [#95](https://github.com/nf-core/rnafusion/issues/95) +- iGenomes optional, but not really [#91](https://github.com/nf-core/rnafusion/issues/91) +- Updated `fusioncatcher` to latest `1.20` version also solving [#95](https://github.com/nf-core/rnafusion/issues/95) ### Removed -* Variables `pizzly_fasta` and `pizzly_gtf` have been removed and replaced with `transcript` and `gtf` -* `Jenkisfile`, test configuration, pylintrc configuration -* Removed `igenomes.config` because the pipeline only supports `Ensembl` version +- Variables `pizzly_fasta` and `pizzly_gtf` have been removed and replaced with `transcript` and `gtf` +- `Jenkisfile`, test configuration, pylintrc configuration +- Removed `igenomes.config` because the pipeline only supports `Ensembl` version --- @@ -83,13 +228,13 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0. ### Changed -* Bumped nf-core template to 1.6 [#69](https://github.com/nf-core/rnafusion/pull/69) +- Bumped nf-core template to 1.6 [#69](https://github.com/nf-core/rnafusion/pull/69) ### Fixed -* Fixed COSMIC parameters not wrapped in quotes [#75](https://github.com/nf-core/rnafusion/issues/75) -* Implemented output output for fusion tools [#72](https://github.com/nf-core/rnafusion/issues/72) -* Fixed reference download link for STAR-Fusion [#71](https://github.com/nf-core/rnafusion/issues/71) +- Fixed COSMIC parameters not wrapped in quotes [#75](https://github.com/nf-core/rnafusion/issues/75) +- Implemented output output for fusion tools [#72](https://github.com/nf-core/rnafusion/issues/72) +- Fixed reference download link for STAR-Fusion [#71](https://github.com/nf-core/rnafusion/issues/71) --- @@ -97,27 +242,27 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0. ### Added -* Added support for extra parameters for tools STAR-Fusion, FusionCatcher and fusion-report -* Added example configuration for `singularity` and `docker` -* Added [fusion-report](https://github.com/matq007/fusion-report) into the stack [#62](https://github.com/nf-core/rnafusion/issues/62), [#55](https://github.com/nf-core/rnafusion/issues/55), [#53](https://github.com/nf-core/rnafusion/issues/53), [#51](https://github.com/nf-core/rnafusion/issues/51) -* Added nextflow helper script `download-singularity-img.nf` -* Added nextflow helper script `download-references.nf` -* Added `Jenkinsfile` for in-house testing +- Added support for extra parameters for tools STAR-Fusion, FusionCatcher and fusion-report +- Added example configuration for `singularity` and `docker` +- Added [fusion-report](https://github.com/matq007/fusion-report) into the stack [#62](https://github.com/nf-core/rnafusion/issues/62), [#55](https://github.com/nf-core/rnafusion/issues/55), [#53](https://github.com/nf-core/rnafusion/issues/53), [#51](https://github.com/nf-core/rnafusion/issues/51) +- Added nextflow helper script `download-singularity-img.nf` +- Added nextflow helper script `download-references.nf` +- Added `Jenkinsfile` for in-house testing ### Changed -* Updated installation of `FusionCatcher` (available now on bioconda) +- Updated installation of `FusionCatcher` (available now on bioconda) ### Fixed -* Fixed empty symlinks (`input.X`) in fusion-report [#68](https://github.com/nf-core/rnafusion/issues/68) -* Fixed FASTA issues [#60](https://github.com/nf-core/rnafusion/issues/60) -* Fixed centralized nf-core/config [#64](https://github.com/nf-core/rnafusion/issues/64) -* Fixed `scrape_software_versions.py` to parse tools versions correctly [#65](https://github.com/nf-core/rnafusion/issues/65) +- Fixed empty symlinks (`input.X`) in fusion-report [#68](https://github.com/nf-core/rnafusion/issues/68) +- Fixed FASTA issues [#60](https://github.com/nf-core/rnafusion/issues/60) +- Fixed centralized nf-core/config [#64](https://github.com/nf-core/rnafusion/issues/64) +- Fixed `scrape_software_versions.py` to parse tools versions correctly [#65](https://github.com/nf-core/rnafusion/issues/65) ### Removed -* Removed `Singularity` +- Removed `Singularity` --- @@ -126,30 +271,12 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0. Version 1.0 marks the first production release of this pipeline under the nf-core flag. The pipeline includes additional help scripts to download references for fusion tools and Singularity images. -* Fusion gene detection tools: - * `STAR-Fusion v1.5.0` - * `Fusioncatcher v1.00` - * `Ericscript v0.5.5` - * `Pizzly v0.37.3` - * `Squid v1.5` -* Visualization tools: - * `FusionInspector v1.3.1` -* Other tools: - * `Summary report` - * `FastQ v0.11.8` - * `MultiQC v1.7` - * `FusionGDB updated 2019/01/23` +Initial release of nf-core/rnafusion, created with the [nf-core](https://nf-co.re/) template. ---- +### `Added` -## [0.1] SciLifeLab/NGI-RNAfusion (ARCHIVED) - 2018/10/05 +### `Fixed` -Initial release of NGI-RNAfusion, created with the [nf-core](http://nf-co.re/) template. -Source code can be found at [SciLifeLab/NGI-RNAfusion](https://github.com/SciLifeLab/NGI-RNAfusion). -The solution works with Docker and Singularity. +### `Dependencies` -* Tools: - * STAR-Fusion - * Fusioncatcher - * FusionInspector - * Custom tool for fusion comparison - generates intersection of detected fusion genes from all tools +### `Deprecated` diff --git a/CITATIONS.md b/CITATIONS.md new file mode 100644 index 00000000..629f86ad --- /dev/null +++ b/CITATIONS.md @@ -0,0 +1,71 @@ +# nf-core/rnafusion: Citations + +## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) + +> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. + +## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) + +> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. + +## Pipeline tools + +- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) + +- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) + + > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. + +- [Arriba](https://github.com/suhrig/arriba) + + > Uhrig S, Ellermann J, Walther T, Burkhardt P, Fröhlich M, Hutter B, Toprak UH, Neumann O, Stenzinger A, Scholl C, Fröhling S, Brors B. Accurate and efficient detection of gene fusions from RNA sequencing data. + > Genome Research. 2021 Mar 31;448-460. doi: 10.1101/gr.257246.119. Epub 2021 Jan 13. PubMed PMID: 33441414; PubMed Central PMCID: PMC7919457. + +- [FusionCatcher](https://github.com/ndaniel/fusioncatcher) + + > Nicorici D, Satalan M, Edgren H, Kangaspeska S, Murumagi A, Kallioniemi O, Virtanen S, Kilkku O. FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data. BioRxiv, 2014 Nov. doi: 10.1101/011650. + +- [Fusion-report](https://github.com/matq007/fusion-report) + + > Proks M, Genomic Profiling of a Comprehensive Nation-wide Collection of Childhood Solid Tumors, Master Thesis, Supervisors: Grøntved L, Díaz de Ståhl T, Nistér M, Ewels P, Garcia MU, Juhos S, University of Southern Denmark, 2019, unpublished. + +- [Kallisto](https://pachterlab.github.io/kallisto/) + + > Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology 2016 Apr. 34, 525–527. doi:10.1038/nbt.3519. PMID: 27043002. + +- [Pizzly](https://github.com/pmelsted/pizzly) + Melsted P, Hateley S, Joseph IC, Pimentel H, Bray N, Pachter L. Fusion detection and quantification by pseudoalignment. BioRxiv, 2017 Jul. doi: 10.1101/166322. + +- [SAMtools](https://pubmed.ncbi.nlm.nih.gov/19505943/) + + > Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002. + +- [Squid](https://github.com/Kingsford-Group/squid) + + > Ma C, Shao M, Kingsford C. SQUID: transcriptomic structural variation detection from RNA-seq. Genome Biol 2028 Apr. 19, 52. doi: 10.1186/s13059-018-1421-5. PubMed PMID: 29650026. PubMed Central PMCID: PMC5896115. + +- [STAR](https://pubmed.ncbi.nlm.nih.gov/23104886/) + + > Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25. PubMed PMID: 23104886; PubMed Central PMCID: PMC3530905. + +- [STAR-Fusion](https://github.com/STAR-Fusion/STAR-Fusion) + > Haas BJ, Dobin A, Li B, Stransky N, Pochet N, Regev A. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biology 2019 Oct;20,213. doi: 10.1186/s13059-019-1842-9 + +## Software packaging/containerisation tools + +- [Anaconda](https://anaconda.com) + + > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. + +- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) + + > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. + +- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) + + > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. + +- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) + +- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) + > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md index cf930c8a..f4fd052f 100644 --- a/CODE_OF_CONDUCT.md +++ b/CODE_OF_CONDUCT.md @@ -1,46 +1,111 @@ -# Contributor Covenant Code of Conduct +# Code of Conduct at nf-core (v1.0) ## Our Pledge -In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation. +In the interest of fostering an open, collaborative, and welcoming environment, we as contributors and maintainers of nf-core, pledge to making participation in our projects and community a harassment-free experience for everyone, regardless of: -## Our Standards +- Age +- Body size +- Familial status +- Gender identity and expression +- Geographical location +- Level of experience +- Nationality and national origins +- Native language +- Physical and neurological ability +- Race or ethnicity +- Religion +- Sexual identity and orientation +- Socioeconomic status -Examples of behavior that contributes to creating a positive environment include: +Please note that the list above is alphabetised and is therefore not ranked in any order of preference or importance. -* Using welcoming and inclusive language -* Being respectful of differing viewpoints and experiences -* Gracefully accepting constructive criticism -* Focusing on what is best for the community -* Showing empathy towards other community members +## Preamble -Examples of unacceptable behavior by participants include: +> Note: This Code of Conduct (CoC) has been drafted by the nf-core Safety Officer and been edited after input from members of the nf-core team and others. "We", in this document, refers to the Safety Officer and members of the nf-core core team, both of whom are deemed to be members of the nf-core community and are therefore required to abide by this Code of Conduct. This document will amended periodically to keep it up-to-date, and in case of any dispute, the most current version will apply. -* The use of sexualized language or imagery and unwelcome sexual attention or advances -* Trolling, insulting/derogatory comments, and personal or political attacks -* Public or private harassment -* Publishing others' private information, such as a physical or electronic address, without explicit permission -* Other conduct which could reasonably be considered inappropriate in a professional setting +An up-to-date list of members of the nf-core core team can be found [here](https://nf-co.re/about). Our current safety officer is Renuka Kudva. + +nf-core is a young and growing community that welcomes contributions from anyone with a shared vision for [Open Science Policies](https://www.fosteropenscience.eu/taxonomy/term/8). Open science policies encompass inclusive behaviours and we strive to build and maintain a safe and inclusive environment for all individuals. + +We have therefore adopted this code of conduct (CoC), which we require all members of our community and attendees in nf-core events to adhere to in all our workspaces at all times. Workspaces include but are not limited to Slack, meetings on Zoom, Jitsi, YouTube live etc. + +Our CoC will be strictly enforced and the nf-core team reserve the right to exclude participants who do not comply with our guidelines from our workspaces and future nf-core activities. + +We ask all members of our community to help maintain a supportive and productive workspace and to avoid behaviours that can make individuals feel unsafe or unwelcome. Please help us maintain and uphold this CoC. + +Questions, concerns or ideas on what we can include? Contact safety [at] nf-co [dot] re ## Our Responsibilities -Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. +The safety officer is responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behaviour. + +The safety officer in consultation with the nf-core core team have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. + +Members of the core team or the safety officer who violate the CoC will be required to recuse themselves pending investigation. They will not have access to any reports of the violations and be subject to the same actions as others in violation of the CoC. + +## When are where does this Code of Conduct apply? + +Participation in the nf-core community is contingent on following these guidelines in all our workspaces and events. This includes but is not limited to the following listed alphabetically and therefore in no order of preference: + +- Communicating with an official project email address. +- Communicating with community members within the nf-core Slack channel. +- Participating in hackathons organised by nf-core (both online and in-person events). +- Participating in collaborative work on GitHub, Google Suite, community calls, mentorship meetings, email correspondence. +- Participating in workshops, training, and seminar series organised by nf-core (both online and in-person events). This applies to events hosted on web-based platforms such as Zoom, Jitsi, YouTube live etc. +- Representing nf-core on social media. This includes both official and personal accounts. + +## nf-core cares 😊 + +nf-core's CoC and expectations of respectful behaviours for all participants (including organisers and the nf-core team) include but are not limited to the following (listed in alphabetical order): + +- Ask for consent before sharing another community member’s personal information (including photographs) on social media. +- Be respectful of differing viewpoints and experiences. We are all here to learn from one another and a difference in opinion can present a good learning opportunity. +- Celebrate your accomplishments at events! (Get creative with your use of emojis 🎉 🥳 💯 🙌 !) +- Demonstrate empathy towards other community members. (We don’t all have the same amount of time to dedicate to nf-core. If tasks are pending, don’t hesitate to gently remind members of your team. If you are leading a task, ask for help if you feel overwhelmed.) +- Engage with and enquire after others. (This is especially important given the geographically remote nature of the nf-core community, so let’s do this the best we can) +- Focus on what is best for the team and the community. (When in doubt, ask) +- Graciously accept constructive criticism, yet be unafraid to question, deliberate, and learn. +- Introduce yourself to members of the community. (We’ve all been outsiders and we know that talking to strangers can be hard for some, but remember we’re interested in getting to know you and your visions for open science!) +- Show appreciation and **provide clear feedback**. (This is especially important because we don’t see each other in person and it can be harder to interpret subtleties. Also remember that not everyone understands a certain language to the same extent as you do, so **be clear in your communications to be kind.**) +- Take breaks when you feel like you need them. +- Using welcoming and inclusive language. (Participants are encouraged to display their chosen pronouns on Zoom or in communication on Slack.) + +## nf-core frowns on 😕 + +The following behaviours from any participants within the nf-core community (including the organisers) will be considered unacceptable under this code of conduct. Engaging or advocating for any of the following could result in expulsion from nf-core workspaces. + +- Deliberate intimidation, stalking or following and sustained disruption of communication among participants of the community. This includes hijacking shared screens through actions such as using the annotate tool in conferencing software such as Zoom. +- “Doxing” i.e. posting (or threatening to post) another person’s personal identifying information online. +- Spamming or trolling of individuals on social media. +- Use of sexual or discriminatory imagery, comments, or jokes and unwelcome sexual attention. +- Verbal and text comments that reinforce social structures of domination related to gender, gender identity and expression, sexual orientation, ability, physical appearance, body size, race, age, religion or work experience. + +### Online Trolling + +The majority of nf-core interactions and events are held online. Unfortunately, holding events online comes with the added issue of online trolling. This is unacceptable, reports of such behaviour will be taken very seriously, and perpetrators will be excluded from activities immediately. + +All community members are required to ask members of the group they are working within for explicit consent prior to taking screenshots of individuals during video calls. + +## Procedures for Reporting CoC violations -Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. +If someone makes you feel uncomfortable through their behaviours or actions, report it as soon as possible. -## Scope +You can reach out to members of the [nf-core core team](https://nf-co.re/about) and they will forward your concerns to the safety officer(s). -This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. +Issues directly concerning members of the core team will be dealt with by other members of the core team and the safety manager, and possible conflicts of interest will be taken into account. nf-core is also in discussions about having an ombudsperson, and details will be shared in due course. -## Enforcement +All reports will be handled with utmost discretion and confidentially. -Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on [Slack](https://nf-co.re/join/slack). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. +## Attribution and Acknowledgements -Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. +- The [Contributor Covenant, version 1.4](http://contributor-covenant.org/version/1/4) +- The [OpenCon 2017 Code of Conduct](http://www.opencon2017.org/code_of_conduct) (CC BY 4.0 OpenCon organisers, SPARC and Right to Research Coalition) +- The [eLife innovation sprint 2020 Code of Conduct](https://sprint.elifesciences.org/code-of-conduct/) +- The [Mozilla Community Participation Guidelines v3.1](https://www.mozilla.org/en-US/about/governance/policies/participation/) (version 3.1, CC BY-SA 3.0 Mozilla) -## Attribution +## Changelog -This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version] +### v1.0 - March 12th, 2021 -[homepage]: http://contributor-covenant.org -[version]: http://contributor-covenant.org/version/1/4/ +- Complete rewrite from original [Contributor Covenant](http://contributor-covenant.org/) CoC. diff --git a/Dockerfile b/Dockerfile deleted file mode 100644 index 5e5af29f..00000000 --- a/Dockerfile +++ /dev/null @@ -1,13 +0,0 @@ -FROM nfcore/base:1.9 -LABEL authors="Martin Proks" \ - description="Docker image containing all software requirements for the nf-core/rnafusion pipeline" - -# Install the conda environment -COPY environment.yml / -RUN conda env create -f /environment.yml && conda clean -a - -# Add conda installation dir to PATH (instead of doing 'conda activate') -ENV PATH /opt/conda/envs/nf-core-rnafusion-1.2.0/bin:$PATH - -# Dump the details of the installed packages to a file for posterity -RUN conda env export --name nf-core-rnafusion-1.2.0 > nf-core-rnafusion-1.2.0.yml diff --git a/LICENSE b/LICENSE index ac8adf18..86e71fe1 100644 --- a/LICENSE +++ b/LICENSE @@ -1,6 +1,6 @@ MIT License -Copyright (c) Martin Proks +Copyright (c) Martin Proks, Annick Renevey Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal diff --git a/README.md b/README.md index d6a8f8e8..fa46e489 100644 --- a/README.md +++ b/README.md @@ -1,120 +1,145 @@ -# ![nf-core/rnafusion](docs/images/nf-core-rnafusion_logo.png) +# ![nf-core/rnafusion](docs/images/nf-core-rnafusion_logo_light.png#gh-light-mode-only) ![nf-core/rnafusion](docs/images/nf-core-rnafusion_logo_dark.png#gh-dark-mode-only) -**RNA sequencing analysis pipeline with curated list of tools for detecting and visualizing fusion genes.** +[![GitHub Actions CI Status](https://github.com/nf-core/rnafusion/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/rnafusion/actions?query=workflow%3A%22nf-core+CI%22) +[![GitHub Actions Linting Status](https://github.com/nf-core/rnafusion/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/rnafusion/actions?query=workflow%3A%22nf-core+linting%22) +[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?logo=Amazon%20AWS)](https://nf-co.re/rnafusion/results) +[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8)](https://doi.org/10.5281/zenodo.XXXXXXX) -[![GitHub Actions CI Status](https://github.com/nf-core/rnafusion/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/rnafusion/actions) -[![GitHub Actions Linting Status](https://github.com/nf-core/rnafusion/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/rnafusion/actions) -[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A519.10.0-brightgreen.svg)](https://www.nextflow.io/) -[![DOI](https://zenodo.org/badge/151721952.svg)](https://zenodo.org/badge/latestdoi/151721952) +[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A521.10.3-23aa62.svg)](https://www.nextflow.io/) +[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?logo=anaconda)](https://docs.conda.io/en/latest/) +[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?logo=docker)](https://www.docker.com/) +[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg)](https://sylabs.io/docs/) +[![Launch on Nextflow Tower](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Nextflow%20Tower-%234256e7)](https://tower.nf/launch?pipeline=https://github.com/nf-core/rnafusion) -[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](http://bioconda.github.io/) -[![Docker](https://img.shields.io/docker/automated/nfcore/rnafusion.svg)](https://hub.docker.com/r/nfcore/rnafusion) +[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23rnafusion-4A154B?logo=slack)](https://nfcore.slack.com/channels/rnafusion) +[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?logo=twitter)](https://twitter.com/nf_core) +[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?logo=youtube)](https://www.youtube.com/c/nf-core) ## Introduction -The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible. - -> The pipeline **requires** >=16 CPU cores and >=30GB RAM - -| Tool | Single-end reads | Version | -| ------------------------------------------------------------------------- | :----------------: | :------: | -| [Arriba](https://github.com/suhrig/arriba) | :x: | `1.2.0` | -| [EricScript](https://sites.google.com/site/bioericscript/getting-started) | :x: | `0.5.5` | -| [FusionCatcher](https://github.com/ndaniel/fusioncatcher) | :white_check_mark: | `1.20` | -| [Fusion-Inspector](https://github.com/FusionInspector/FusionInspector) | :x: | `2.2.1` | -| [fusion-report](https://github.com/matq007/fusion-report) | - | `2.1.3` | -| [Pizzly](https://github.com/pmelsted/pizzly) | :x: | `0.37.3` | -| [Squid](https://github.com/Kingsford-Group/squid) | :x: | `1.5` | -| [Star-Fusion](https://github.com/STAR-Fusion/STAR-Fusion) | :white_check_mark: | `1.8.1` | - -For available parameters or help run: - -```bash -nextflow run nf-core/rnafusion --help -``` +**nf-core/rnafusion** is a bioinformatics best-practice analysis pipeline for RNA sequencing analysis pipeline with curated list of tools for detecting and visualizing fusion genes. + +The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community! + +> **IMPORTANT: conda is not supported currently.** Run with singularity or docker. + +> GRCh38 is the only supported reference + +| Tool | Single-end reads | Version | +| --------------------------------------------------------- | :----------------: | :------: | +| [Arriba](https://github.com/suhrig/arriba) | :x: | `2.2.1` | +| [FusionCatcher](https://github.com/ndaniel/fusioncatcher) | :white_check_mark: | `1.33` | +| [Fusion-report](https://github.com/matq007/fusion-report) | - | `2.1.5` | +| [Pizzly](https://github.com/pmelsted/pizzly) | :x: | `0.37.3` | +| [Squid](https://github.com/Kingsford-Group/squid) | :x: | `1.5` | +| [STAR-Fusion](https://github.com/STAR-Fusion/STAR-Fusion) | :white_check_mark: | `1.10.1` | + +On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/rnafusion/results). + +In rnafusion the full-sized test includes reference building and fusion detection. The test dataset is taken from [here](https://github.com/nf-core/test-datasets/tree/rnafusion/testdata/human). + +## Pipeline summary + +#### Build references + +`--build_references` triggers a parallel workflow to build all references + +1. Download ensembl fasta and gtf files +2. Create STAR index +3. Download arriba references +4. Download fusioncatcher references +5. Download pizzly references (kallisto index) +6. Download and build STAR-fusion references +7. Download fusion-report DBs + +#### Main workflow + +1. Input samplesheet check +2. Concatenate fastq files per sample +3. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)) +4. Arriba subworkflow + - [STAR](https://github.com/alexdobin/STAR) alignment + - [Samtool](https://github.com/samtools/samtools) sort + - [Samtool](https://github.com/samtools/samtools) index + - [Arriba](https://github.com/suhrig/arriba) fusion detection + - [Arriba](https://github.com/suhrig/arriba) visualisation +5. Pizzly subworkflow + - [Kallisto](https://pachterlab.github.io/kallisto/) quantification + - [Pizzly](https://github.com/pmelsted/pizzly) fusion detection +6. Squid subworkflow + - [STAR](https://github.com/alexdobin/STAR) alignment + - [Samtools view](http://www.htslib.org/): convert sam output from STAR to bam + - [Samtools sort](http://www.htslib.org/): bam output from STAR + - [Squid](https://github.com/Kingsford-Group/squid) fusion detection + - [Squid](https://github.com/Kingsford-Group/squid) annotate +7. STAR-fusion subworkflow + - [STAR](https://github.com/alexdobin/STAR) alignment + - [STAR-fusion](https://github.com/STAR-Fusion/STAR-Fusion) fusion detection +8. Fusioncatcher subworkflow + - [FusionCatcher](https://github.com/ndaniel/fusioncatcher) fusion detection +9. Fusion-report subworkflow + - Merge all fusions detected by the different tools + - [Fusion-report](https://github.com/matq007/fusion-report) +10. FusionInspector subworkflow + - [FusionInspector](https://github.com/FusionInspector/FusionInspector) +11. Present QC for raw reads ([`MultiQC`](http://multiqc.info/)) +12. QC for mapped reads ([`QualiMap: BAM QC`](https://kokonech.github.io/qualimap/HG00096.chr20_bamqc/qualimapReport.html)) +13. Index mapped reads ([samtools index](http://www.htslib.org/)) +14. Collect metrics ([`picard CollectRnaSeqMetrics`](https://gatk.broadinstitute.org/hc/en-us/articles/360037057492-CollectRnaSeqMetrics-Picard-) and ([`picard MarkDuplicates`](https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard-)) ## Quick Start -i. Install [`nextflow`](https://nf-co.re/usage/installation) +1. Install [`Nextflow`](https://www.nextflow.io/docs/latest/getstarted.html#installation) (`>=21.10.3`) -ii. Install either [`Docker`](https://docs.docker.com/engine/installation/) or [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/) for full pipeline reproducibility (please only use [`Conda`](https://conda.io/miniconda.html) as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles)) +2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/) (you can follow [this tutorial](https://singularity-tutorial.github.io/01-installation/)), [`Podman`](https://podman.io/), [`Shifter`](https://nersc.gitlab.io/development/shifter/how-to-use/) or [`Charliecloud`](https://hpc.github.io/charliecloud/) for full pipeline reproducibility _(you can use [`Conda`](https://conda.io/miniconda.html) both to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_. -iii. Download references for all tools +3. Download the pipeline and test it on a minimal dataset with a single command: -```bash -nextflow run nf-core/rnafusion/download-references.nf -profile \ - --download_all \ - --outdir \ - --cosmic_usr --cosmic_passwd -``` + ```console + nextflow run nf-core/rnafusion -profile test,YOURPROFILE --outdir + ``` -> Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile ` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. + Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (`YOURPROFILE` in the example command above). You can chain multiple config profiles in a comma-separated string. -iv. Start running your own analysis! + > - The pipeline comes with config profiles called `docker`, `singularity`, `podman`, `shifter`, `charliecloud` and `conda` which instruct the pipeline to use the named tool for software management. For example, `-profile test,docker`. + > - Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile ` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. + > - If you are using `singularity`, please use the [`nf-core download`](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to download images first, before running the pipeline. Setting the [`NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options enables you to store and re-use the images from a central location for future pipeline runs. + > - If you are using `conda`, it is highly recommended to use the [`NXF_CONDA_CACHEDIR` or `conda.cacheDir`](https://www.nextflow.io/docs/latest/conda.html) settings to store the environments in a central location for future pipeline runs. -```bash -nextflow run nf-core/rnafusion -profile \ - --reads '*_R{1,2}.fastq.gz' \ - --genomes_base 'reference_path_from_above' - --arriba --star_fusion --fusioncatcher --ericscript --pizzly --squid \ - --arriba_vis --fusion_inspector +4. Start running your own analysis! + +```console +nextflow run nf-core/rnafusion --input samplesheet.csv --outdir --genome GRCh38 --all -profile ``` -See [usage docs](docs/usage.md) for all of the available options when running the pipeline. +> Note that paths need to be absolute and that runs with conda are not supported. ## Documentation -The nf-core/rnafusion pipeline comes with documentation about the pipeline, found in the `docs/` directory: +The nf-core/rnafusion pipeline comes with documentation about the pipeline [usage](https://nf-co.re/rnafusion/usage), [parameters](https://nf-co.re/rnafusion/parameters) and [output](https://nf-co.re/rnafusion/output). -1. [Installation](https://nf-co.re/usage/installation) -2. Pipeline configuration - * [Download references](docs/references.md) - * [Local installation](https://nf-co.re/usage/local_installation) - * [Adding your own system config](https://nf-co.re/usage/adding_own_config) -3. [Running the pipeline](docs/usage.md) -4. [Output and how to interpret the results](docs/output.md) -5. [Troubleshooting](https://nf-co.re/usage/troubleshooting) +## Credits -Use predefined configuration for desired Institution cluster provided at [nfcore/config](https://github.com/nf-core/configs) repository. +nf-core/rnafusion was written by Martin Proks ([@matq007](https://github.com/matq007)), Maxime Garcia ([@maxulysse](https://github.com/maxulysse)) and Annick Renevey ([@rannick](https://github.com/rannick)) -## Credits +We thank the following people for their help in the development of this pipeline: -This pipeline was originally written by Martin Proks ([@matq007](https://github.com/matq007)) in collaboration with Karolinska Institutet, SciLifeLab and University of Southern Denmark as a master thesis. This is a follow-up development started by Rickard Hammarén ([@Hammarn](https://github.com/Hammarn)). - -Special thanks goes to all supervisors: - -* [Assoc. Prof. Teresita Díaz de Ståhl, PhD](https://ki.se/en/onkpat/teresita-diaz-de-stahls-group) -* [MD. Monica Nistér, PhD](https://ki.se/en/onkpat/research-team-monica-nister) -* [Maxime U Garcia, PhD](https://github.com/MaxUlysse) -* [Szilveszter Juhos](https://github.com/szilvajuhos) -* [Phil Ewels, PhD](https://github.com/ewels) -* [Assoc. Prof. Lars Grøntved, PhD](https://portal.findresearcher.sdu.dk/en/persons/larsgr) - -## Tool References - -* **STAR-Fusion: Fast and Accurate Fusion Transcript Detection from RNA-Seq** -Brian Haas, Alexander Dobin, Nicolas Stransky, Bo Li, Xiao Yang, Timothy Tickle, Asma Bankapur, Carrie Ganote, Thomas Doak, Natalie Pochet, Jing Sun, Catherine Wu, Thomas Gingeras, Aviv Regev -bioRxiv 120295; doi: [https://doi.org/10.1101/120295](https://doi.org/10.1101/120295) -* D. Nicorici, M. Satalan, H. Edgren, S. Kangaspeska, A. Murumagi, O. Kallioniemi, S. Virtanen, O. Kilkku, **FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data**, bioRxiv, Nov. 2014, -[DOI:10.1101/011650](http://dx.doi.org/10.1101/011650) -* Benelli M, Pescucci C, Marseglia G, Severgnini M, Torricelli F, Magi A. **Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript**. Bioinformatics. 2012; 28(24): 3232-3239. -* **Fusion detection and quantification by pseudoalignment** -Páll Melsted, Shannon Hateley, Isaac Charles Joseph, Harold Pimentel, Nicolas L Bray, Lior Pachter, bioRxiv 166322; doi: [https://doi.org/10.1101/166322](https://doi.org/10.1101/166322) -* **SQUID: transcriptomic structural variation detection from RNA-seq** Cong Ma, Mingfu Shao and Carl Kingsford, Genome Biology, 2018, doi: [https://doi.org/10.1186/s13059-018-1421-5](https://doi.org/10.1186/s13059-018-1421-5) -* **Fusion-Inspector** download: [https://github.com/FusionInspector](https://github.com/FusionInspector) -* **fusion-report** download: [https://github.com/matq007/fusion-report](https://github.com/matq007/fusion-report); doi: [https://doi.org/10.5281/zenodo.3520171](https://doi.org/10.5281/zenodo.3520171) -* **FastQC** download: [https://www.bioinformatics.babraham.ac.uk/projects/fastqc/](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) -* **MultiQC** Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. [https://doi.org/10.1093/bioinformatics/btw354](https://doi.org/10.1093/bioinformatics/btw354) Download: [https://multiqc.info/](https://multiqc.info/) +- [Phil Ewels](https://github.com/ewels) +- [Rickard Hammarén](https://github.com/Hammarn) +- [Alexander Peltzer](https://github.com/apeltzer) +- [Praveen Raj](https://github.com/praveenraj2018) ## Contributions and Support If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md). -For further information or help, don't hesitate to get in touch on [Slack](https://nfcore.slack.com/channels/rnafusion) (you can join with [this invite](https://nf-co.re/join/slack)). +For further information or help, don't hesitate to get in touch on the [Slack `#rnafusion` channel](https://nfcore.slack.com/channels/rnafusion) (you can join with [this invite](https://nf-co.re/join/slack)). -## Citation +## Citations -If you use nf-core/rnafusion for your analysis, please cite it using the following doi: [10.5281/zenodo.151721952](https://zenodo.org/badge/latestdoi/151721952) +If you use nf-core/rnafusion for your analysis, please cite it using the following doi: [10.5281/zenodo.3946477](https://doi.org/10.5281/zenodo.3946477) + +An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file. You can cite the `nf-core` publication as follows: @@ -122,9 +147,4 @@ You can cite the `nf-core` publication as follows: > > Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. > -> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). -> ReadCube: [Full Access Link](https://rdcu.be/b1GjZ) - -[![Barntumörbanken](docs/images/BTB_logo.png)](https://ki.se/forskning/barntumorbanken-0) | [![SciLifeLab](docs/images/SciLifeLab_logo.png)](https://scilifelab.se) -:-:|:-: -[![National Genomics Infrastructure](docs/images/NGI_logo.png)](https://ngisweden.scilifelab.se/) | [![University of Southern Denmark](docs/images/SDU_logo.png)](https://www.sdu.dk/da) +> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). diff --git a/assets/dummy_file_arriba.txt b/assets/dummy_file_arriba.txt new file mode 100644 index 00000000..e69de29b diff --git a/assets/dummy_file_fusioncatcher.txt b/assets/dummy_file_fusioncatcher.txt new file mode 100644 index 00000000..e69de29b diff --git a/assets/dummy_file_pizzly.txt b/assets/dummy_file_pizzly.txt new file mode 100644 index 00000000..e69de29b diff --git a/assets/dummy_file_squid.txt b/assets/dummy_file_squid.txt new file mode 100644 index 00000000..e69de29b diff --git a/assets/dummy_file_starfusion.txt b/assets/dummy_file_starfusion.txt new file mode 100644 index 00000000..e69de29b diff --git a/assets/email_template.html b/assets/email_template.html index 2e85801c..efcd0eaa 100644 --- a/assets/email_template.html +++ b/assets/email_template.html @@ -1,6 +1,5 @@ - diff --git a/assets/multiqc_config.yaml b/assets/multiqc_config.yaml index 14f8ada3..ce53881a 100644 --- a/assets/multiqc_config.yaml +++ b/assets/multiqc_config.yaml @@ -1,11 +1,11 @@ report_comment: > - This report has been generated by the nf-core/rnafusion - analysis pipeline. For information about how to interpret these results, please see the - documentation. + This report has been generated by the nf-core/rnafusion + analysis pipeline. For information about how to interpret these results, please see the + documentation. report_section_order: - software_versions: - order: -1000 - nf-core-rnafusion-summary: - order: -1001 + software_versions: + order: -1000 + nf-core-rnafusion-summary: + order: -1001 export_plots: true diff --git a/assets/multiqc_config.yml b/assets/multiqc_config.yml new file mode 100644 index 00000000..0ffd3e5e --- /dev/null +++ b/assets/multiqc_config.yml @@ -0,0 +1,11 @@ +report_comment: > + This report has been generated by the nf-core/rnafusion + analysis pipeline. For information about how to interpret these results, please see the + documentation. +report_section_order: + software_versions: + order: -1000 + "nf-core-rnafusion-summary": + order: -1001 + +export_plots: true diff --git a/assets/nf-core-rnafusion_logo.png b/assets/nf-core-rnafusion_logo.png deleted file mode 100644 index ea7ca150..00000000 Binary files a/assets/nf-core-rnafusion_logo.png and /dev/null differ diff --git a/assets/nf-core-rnafusion_logo_light.png b/assets/nf-core-rnafusion_logo_light.png new file mode 100644 index 00000000..55f38541 Binary files /dev/null and b/assets/nf-core-rnafusion_logo_light.png differ diff --git a/assets/nf-core-rnafusion_social_preview.png b/assets/nf-core-rnafusion_social_preview.png deleted file mode 100644 index a24ca066..00000000 Binary files a/assets/nf-core-rnafusion_social_preview.png and /dev/null differ diff --git a/assets/nf-core-rnafusion_social_preview.svg b/assets/nf-core-rnafusion_social_preview.svg deleted file mode 100644 index eeefcb43..00000000 --- a/assets/nf-core-rnafusion_social_preview.svg +++ /dev/null @@ -1,446 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - image/svg+xml - - - - - - - RNA-seq analysis pipeline for detection gene-fusions - rnafusion - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/assets/samplesheet.csv b/assets/samplesheet.csv new file mode 100644 index 00000000..4f9acc40 --- /dev/null +++ b/assets/samplesheet.csv @@ -0,0 +1,2 @@ +sample,fastq_1,fastq_2,strandedness +test_rnafusion,https://github.com/nf-core/test-datasets/raw/d6cd12c9a69c148ef986d156d110f741df482b04/testdata/human/reads_1.fq.gz,https://github.com/nf-core/test-datasets/raw/d6cd12c9a69c148ef986d156d110f741df482b04/testdata/human/reads_2.fq.gz,forward diff --git a/assets/samplesheet_valid.csv b/assets/samplesheet_valid.csv new file mode 100644 index 00000000..4ac8f7b1 --- /dev/null +++ b/assets/samplesheet_valid.csv @@ -0,0 +1,2 @@ +sample,fastq_1,fastq_2,strandedness +test,https://github.com/nf-core/test-datasets/raw/rnafusion/testdata/human/reads_1.fq.gz,https://github.com/nf-core/test-datasets/raw/rnafusion/testdata/human/reads_2.fq.gz,forward diff --git a/assets/schema_input.json b/assets/schema_input.json new file mode 100644 index 00000000..aff11de1 --- /dev/null +++ b/assets/schema_input.json @@ -0,0 +1,36 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema", + "$id": "https://raw.githubusercontent.com/nf-core/rnafusion/master/assets/schema_input.json", + "title": "nf-core/rnafusion pipeline - params.input schema", + "description": "Schema for the file provided with params.input", + "type": "array", + "items": { + "type": "object", + "properties": { + "sample": { + "type": "string", + "pattern": "^\\S+$", + "errorMessage": "Sample name must be provided and cannot contain spaces" + }, + "fastq_1": { + "type": "string", + "pattern": "^\\S+\\.f(ast)?q\\.gz$", + "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'" + }, + "fastq_2": { + "errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'", + "anyOf": [ + { + "type": "string", + "pattern": "^\\S+\\.f(ast)?q\\.gz$" + }, + { + "type": "string", + "maxLength": 0 + } + ] + } + }, + "required": ["sample", "fastq_1"] + } +} diff --git a/assets/sendmail_template.txt b/assets/sendmail_template.txt index 3b09baa6..ff6631ad 100644 --- a/assets/sendmail_template.txt +++ b/assets/sendmail_template.txt @@ -12,18 +12,18 @@ $email_html Content-Type: image/png;name="nf-core-rnafusion_logo.png" Content-Transfer-Encoding: base64 Content-ID: -Content-Disposition: inline; filename="nf-core-rnafusion_logo.png" +Content-Disposition: inline; filename="nf-core-rnafusion_logo_light.png" -<% out << new File("$baseDir/assets/nf-core-rnafusion_logo.png"). - bytes. - encodeBase64(). - toString(). - tokenize( '\n' )*. - toList()*. - collate( 76 )*. - collect { it.join() }. - flatten(). - join( '\n' ) %> +<% out << new File("$projectDir/assets/nf-core-rnafusion_logo_light.png"). + bytes. + encodeBase64(). + toString(). + tokenize( '\n' )*. + toList()*. + collate( 76 )*. + collect { it.join() }. + flatten(). + join( '\n' ) %> <% if (mqcFile){ @@ -37,15 +37,15 @@ Content-ID: Content-Disposition: attachment; filename=\"${mqcFileObj.getName()}\" ${mqcFileObj. - bytes. - encodeBase64(). - toString(). - tokenize( '\n' )*. - toList()*. - collate( 76 )*. - collect { it.join() }. - flatten(). - join( '\n' )} + bytes. + encodeBase64(). + toString(). + tokenize( '\n' )*. + toList()*. + collate( 76 )*. + collect { it.join() }. + flatten(). + join( '\n' )} """ }} %> diff --git a/bin/check_samplesheet.py b/bin/check_samplesheet.py new file mode 100755 index 00000000..e854d09b --- /dev/null +++ b/bin/check_samplesheet.py @@ -0,0 +1,297 @@ +#!/usr/bin/env python + + +import os +import sys +import errno +import argparse + + +def parse_args(args=None): + Description = "Reformat nf-core/rnafusion samplesheet file and check its contents." + Epilog = "Example usage: python check_samplesheet.py " + + parser = argparse.ArgumentParser(description=Description, epilog=Epilog) + parser.add_argument("FILE_IN", help="Input samplesheet file.") + parser.add_argument("FILE_OUT", help="Output file.") + return parser.parse_args(args) + + +def make_dir(path): + if len(path) > 0: + try: + os.makedirs(path) + except OSError as exception: + if exception.errno != errno.EEXIST: + raise exception + + + VALID_FORMATS = ( + ".fq.gz", + ".fastq.gz", + ) + + def __init__( + self, + sample_col="sample", + first_col="fastq_1", + second_col="fastq_2", + single_col="single_end", + **kwargs, + ): + """ + Initialize the row checker with the expected column names. + + Args: + sample_col (str): The name of the column that contains the sample name + (default "sample"). + first_col (str): The name of the column that contains the first (or only) + FASTQ file path (default "fastq_1"). + second_col (str): The name of the column that contains the second (if any) + FASTQ file path (default "fastq_2"). + single_col (str): The name of the new column that will be inserted and + records whether the sample contains single- or paired-end sequencing + reads (default "single_end"). + + """ + super().__init__(**kwargs) + self._sample_col = sample_col + self._first_col = first_col + self._second_col = second_col + self._single_col = single_col + self._seen = set() + self.modified = [] + + def validate_and_transform(self, row): + """ + Perform all validations on the given row and insert the read pairing status. + + Args: + row (dict): A mapping from column headers (keys) to elements of that row + (values). + + """ + self._validate_sample(row) + self._validate_first(row) + self._validate_second(row) + self._validate_pair(row) + self._seen.add((row[self._sample_col], row[self._first_col])) + self.modified.append(row) + + def _validate_sample(self, row): + """Assert that the sample name exists and convert spaces to underscores.""" + assert len(row[self._sample_col]) > 0, "Sample input is required." + # Sanitize samples slightly. + row[self._sample_col] = row[self._sample_col].replace(" ", "_") + + def _validate_first(self, row): + """Assert that the first FASTQ entry is non-empty and has the right format.""" + assert len(row[self._first_col]) > 0, "At least the first FASTQ file is required." + self._validate_fastq_format(row[self._first_col]) + + def _validate_second(self, row): + """Assert that the second FASTQ entry has the right format if it exists.""" + if len(row[self._second_col]) > 0: + self._validate_fastq_format(row[self._second_col]) + + def _validate_pair(self, row): + """Assert that read pairs have the same file extension. Report pair status.""" + if row[self._first_col] and row[self._second_col]: + row[self._single_col] = False + assert ( + Path(row[self._first_col]).suffixes[-2:] == Path(row[self._second_col]).suffixes[-2:] + ), "FASTQ pairs must have the same file extensions." + else: + row[self._single_col] = True + + def _validate_fastq_format(self, filename): + """Assert that a given filename has one of the expected FASTQ extensions.""" + assert any(filename.endswith(extension) for extension in self.VALID_FORMATS), ( + f"The FASTQ file has an unrecognized extension: {filename}\n" + f"It should be one of: {', '.join(self.VALID_FORMATS)}" + ) + + def validate_unique_samples(self): + """ + Assert that the combination of sample name and FASTQ filename is unique. + + In addition to the validation, also rename the sample if more than one sample, + FASTQ file combination exists. + + """ + assert len(self._seen) == len(self.modified), "The pair of sample name and FASTQ must be unique." + if len({pair[0] for pair in self._seen}) < len(self._seen): + counts = Counter(pair[0] for pair in self._seen) + seen = Counter() + for row in self.modified: + sample = row[self._sample_col] + seen[sample] += 1 + if counts[sample] > 1: + row[self._sample_col] = f"{sample}_T{seen[sample]}" + + +def read_head(handle, num_lines=10): + """Read the specified number of lines from the current position in the file.""" + lines = [] + for idx, line in enumerate(handle): + if idx == num_lines: + break + lines.append(line) + return "".join(lines) + + +def sniff_format(handle): + """ + Detect the tabular format. + + Args: + handle (text file): A handle to a `text file`_ object. The read position is + expected to be at the beginning (index 0). + + Returns: + csv.Dialect: The detected tabular format. + + .. _text file: + https://docs.python.org/3/glossary.html#term-text-file + + """ + peek = read_head(handle) + handle.seek(0) + sniffer = csv.Sniffer() + if not sniffer.has_header(peek): + logger.critical(f"The given sample sheet does not appear to contain a header.") + sys.exit(1) + dialect = sniffer.sniff(peek) + return dialect + +def check_samplesheet(file_in, file_out): + """ + This function checks that the samplesheet follows the following structure: + + sample,fastq_1,fastq_2,strandedness + SAMPLE_PE,SAMPLE_PE_RUN1_1.fastq.gz,SAMPLE_PE_RUN1_2.fastq.gz,forward + SAMPLE_PE,SAMPLE_PE_RUN2_1.fastq.gz,SAMPLE_PE_RUN2_2.fastq.gz,forward + + For an example see: + https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv + """ + + sample_mapping_dict = {} + with open(file_in, "r") as fin: + + ## Check header + MIN_COLS = 2 + HEADER = ["sample", "fastq_1", "fastq_2", "strandedness"] + header = [x.strip('"') for x in fin.readline().strip().split(",")] + if header[: len(HEADER)] != HEADER: + print("ERROR: Please check samplesheet header -> {} != {}".format(",".join(header), ",".join(HEADER))) + sys.exit(1) + + ## Check sample entries + for line in fin: + lspl = [x.strip().strip('"') for x in line.strip().split(",")] + + # Check valid number of columns per row + if len(lspl) < len(HEADER): + print_error( + "Invalid number of columns (minimum = {})!".format(len(HEADER)), + "Line", + line, + ) + num_cols = len([x for x in lspl if x]) + if num_cols < MIN_COLS: + print_error( + "Invalid number of populated columns (minimum = {})!".format(MIN_COLS), + "Line", + line, + ) + + ## Check sample name entries + sample, fastq_1, fastq_2 , strandedness = lspl[: len(HEADER)] + sample = sample.replace(" ", "_") + if not sample: + print_error("Sample entry has not been specified!", "Line", line) + + ## Check FastQ file extension + for fastq in [fastq_1, fastq_2]: + if fastq: + if fastq.find(" ") != -1: + print_error("FastQ file contains spaces!", "Line", line) + if not fastq.endswith(".fastq.gz") and not fastq.endswith(".fq.gz"): + print_error( + "FastQ file does not have extension '.fastq.gz' or '.fq.gz'!", + "Line", + line, + ) + + ## Check strandedness + strandednesses = ["unstranded", "forward", "reverse"] + if strandedness: + if strandedness not in strandednesses: + print_error( + f"Strandedness must be one of '{', '.join(strandednesses)}'!", + "Line", + line, + ) + else: + print_error( + f"Strandedness has not been specified! Must be one of {', '.join(strandednesses)}.", + "Line", + line, + ) + + ## Auto-detect paired-end/single-end + sample_info = [] ## [single_end, fastq_1, fastq_2, strandedness] + if sample and fastq_1 and fastq_2: ## Paired-end short reads + sample_info = ["0", fastq_1, fastq_2, strandedness] + elif sample and fastq_1 and not fastq_2: ## Single-end short reads + sample_info = ["1", fastq_1, fastq_2, strandedness] + else: + print_error("Invalid combination of columns provided!", "Line", line) + + ## Create sample mapping dictionary = { sample: [ single_end, fastq_1, fastq_2, strandedness ] } + if sample not in sample_mapping_dict: + sample_mapping_dict[sample] = [sample_info] + else: + if sample_info in sample_mapping_dict[sample]: + print_error("Samplesheet contains duplicate rows!", "Line", line) + else: + sample_mapping_dict[sample].append(sample_info) + + ## Write validated samplesheet with appropriate columns + if len(sample_mapping_dict) > 0: + out_dir = os.path.dirname(file_out) + make_dir(out_dir) + with open(file_out, "w") as fout: + fout.write(",".join(["sample", "single_end", "fastq_1", "fastq_2", "strandedness"]) + "\n") + for sample in sorted(sample_mapping_dict.keys()): + + ## Check that multiple runs of the same sample are of the same datatype + if not all(x[0] == sample_mapping_dict[sample][0][0] for x in sample_mapping_dict[sample]): + print_error("Multiple runs of a sample must be of the same datatype!", "Sample: {}".format(sample)) + + ## Check that multiple runs of the same sample are of the same strandedness + if not all( + x[-1] == sample_mapping_dict[sample][0][-1] + for x in sample_mapping_dict[sample] + ): + print_error( + f"Multiple runs of a sample must have the same strandedness!", + "Sample", + sample, + ) + + for idx, val in enumerate(sample_mapping_dict[sample]): + fout.write(",".join(["{}_T{}".format(sample, idx + 1)] + val) + "\n") + else: + print_error("No entries to process!", "Samplesheet: {}".format(file_in)) + + +def main(args=None): + args = parse_args(args) + check_samplesheet(args.FILE_IN, args.FILE_OUT) + + +if __name__ == "__main__": + sys.exit(main()) + diff --git a/bin/markdown_to_html.py b/bin/markdown_to_html.py deleted file mode 100755 index 57cc4263..00000000 --- a/bin/markdown_to_html.py +++ /dev/null @@ -1,100 +0,0 @@ -#!/usr/bin/env python -from __future__ import print_function -import argparse -import markdown -import os -import sys - -def convert_markdown(in_fn): - input_md = open(in_fn, mode="r", encoding="utf-8").read() - html = markdown.markdown( - "[TOC]\n" + input_md, - extensions = [ - 'pymdownx.extra', - 'pymdownx.b64', - 'pymdownx.highlight', - 'pymdownx.emoji', - 'pymdownx.tilde', - 'toc' - ], - extension_configs = { - 'pymdownx.b64': { - 'base_path': os.path.dirname(in_fn) - }, - 'pymdownx.highlight': { - 'noclasses': True - }, - 'toc': { - 'title': 'Table of Contents' - } - } - ) - return html - -def wrap_html(contents): - header = """ - - - - - -
- """ - footer = """ -
- - - """ - return header + contents + footer - - -def parse_args(args=None): - parser = argparse.ArgumentParser() - parser.add_argument('mdfile', type=argparse.FileType('r'), nargs='?', - help='File to convert. Defaults to stdin.') - parser.add_argument('-o', '--out', type=argparse.FileType('w'), - default=sys.stdout, - help='Output file name. Defaults to stdout.') - return parser.parse_args(args) - -def main(args=None): - args = parse_args(args) - converted_md = convert_markdown(args.mdfile.name) - html = wrap_html(converted_md) - args.out.write(html) - -if __name__ == '__main__': - sys.exit(main()) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py deleted file mode 100755 index 36024ff1..00000000 --- a/bin/scrape_software_versions.py +++ /dev/null @@ -1,68 +0,0 @@ -#!/usr/bin/env python -from __future__ import print_function -from collections import OrderedDict -import re -import os - -regexes = { - 'nf-core/rnafusion': ['v_pipeline.txt', r"(\S+)"], - 'Nextflow': ['v_nextflow.txt', r"(\S+)"], - 'FastQC': ['v_fastqc.txt', r"FastQC v(\S+)"], - 'MultiQC': ['v_multiqc.txt', r"multiqc, version (\S+)"], - 'Arriba': ['v_arriba.txt', r"arriba=(\S+)"], - 'EricScript': ['v_ericscript.txt', r"ericscript=(\S+)"], - 'FusionCatcher': ['v_fusioncatcher.txt', r"fusioncatcher=(\S+)"], - 'Fusion-Inspector': ['v_fusion_inspector.txt', r"fusion-inspector=(\S+)"], - 'fusion-report': ['v_fusion_report.txt', r"fusion-report=(\S+)"], - 'Pizzly': ['v_pizzly.txt', r"pizzly=(\S+)"], - 'STAR-Fusion': ['v_star_fusion.txt', r"star-fusion=(\S+)"], - 'Squid': ['v_squid.txt', r"squid=(\S+)"] -} -results = OrderedDict() -results['nf-core/rnafusion'] = 'N/A' -results['Nextflow'] = 'N/A' -results['FastQC'] = 'N/A' -results['MultiQC'] = 'N/A' -results['Arriba'] = 'N/A' -results['EricScript'] = 'N/A' -results['FusionCatcher'] = 'N/A' -results['Fusion-Inspector'] = 'N/A' -results['fusion-report'] = 'N/A' -results['Pizzly'] = 'N/A' -results['STAR-Fusion'] = 'N/A' -results['Squid'] = 'N/A' - -# Search each file using its regex -for k, v in regexes.items(): - try: - with open(v[0]) as x: - versions = x.read() - match = re.search(v[1], versions) - if match: - results[k] = "v{}".format(match.group(1)) - except IOError: - results[k] = False - -# Remove software set to false in results -for k in list(results): - if not results[k]: - del(results[k]) - -# Dump to YAML -print (''' -id: 'software_versions' -section_name: 'nf-core/rnafusion Software Versions' -section_href: 'https://github.com/nf-core/rnafusion' -plot_type: 'html' -description: 'are collected at run time from the software output.' -data: | -
-''') -for k,v in results.items(): - print("
{}
{}
".format(k,v)) -print ("
") - -# Write out regexes as csv file: -with open('software_versions.csv', 'w') as f: - for k,v in results.items(): - f.write("{}\t{}\n".format(k,v)) diff --git a/build-ctat.nf b/build-ctat.nf deleted file mode 100644 index b8b46f78..00000000 --- a/build-ctat.nf +++ /dev/null @@ -1,160 +0,0 @@ -#!/usr/bin/env nextflow -/* -================================================================================ - nf-core/rnafusion -================================================================================ -nf-core/rnafusion: - RNA-seq analysis pipeline for detection gene-fusions --------------------------------------------------------------------------------- - @Homepage - https://nf-co.re/rnafusion --------------------------------------------------------------------------------- - @Documentation - https://nf-co.re/rnafusion/docs --------------------------------------------------------------------------------- - @Repository - https://github.com/nf-core/rnafusion --------------------------------------------------------------------------------- -*/ - -def helpMessage() { - log.info nfcoreHeader() - log.info""" - Usage: - - The typical command for downloading references is as follows: - - nextflow run nf-core/rnafusion/build-ctat.nf -profile [PROFILE] [OPTIONS] --outdir /path/to/output - - Mandatory arguments: - --fasta [file] Path to fasta reference - --gtf [file] Path to GTF annotation - --outdir [path] Output directory for downloading - -profile [str] Configuration profile [https://github.com/nf-core/configs] - """.stripIndent() -} - -/* - * SET UP CONFIGURATION VARIABLES - */ - -// Show help message -if (params.help) exit 0, helpMessage() - -params.fasta = params.genome ? params.genomes[params.genome].fasta ?: null : null -params.gtf = params.genome ? params.genomes[params.genome].gtf ?: null : null - -ch_fasta = Channel.value(file(params.fasta)).ifEmpty{exit 1, "Fasta file not found: ${params.fasta}"} -ch_gtf = Channel.value(file(params.gtf)).ifEmpty{exit 1, "GTF annotation file not found: ${params.gtf}"} - -if (!params.outdir) exit 1, "Output directory not specified!" - -// Header log info -log.info nfcoreHeader() -def summary = [:] -summary['Pipeline Name'] = 'nf-core/rnafusion/build-ctat.nf' -summary['Pipeline Version'] = workflow.manifest.version -if(workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" -summary['Max Resources'] = "$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job" -summary['Output dir'] = params.outdir -summary['User'] = workflow.userName -log.info summary.collect { k,v -> "${k.padRight(18)}: $v" }.join("\n") -log.info "\033[2m----------------------------------------------------\033[0m" - -// Check the hostnames against configured profiles -checkHostname() - -/* -================================================================================ - DOWNLOAD -================================================================================ -*/ - -process star_fusion { - label 'process_high' - label 'process_long' - publishDir "${params.outdir}", mode: 'copy' - - input: - file(fasta) from ch_fasta - file(gtf) from ch_gtf - - output: - file '*' - - script: - """ - wget -N ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_releases/Pfam-A.hmm.gz - gunzip Pfam-A.hmm.gz && hmmpress Pfam-A.hmm - - wget https://github.com/FusionAnnotator/CTAT_HumanFusionLib/releases/download/v0.2.0/fusion_lib.Mar2019.dat.gz -O CTAT_HumanFusionLib.dat.gz - - # Dfam - wget https://www.dfam.org/releases/Dfam_3.1/infrastructure/dfamscan/homo_sapiens_dfam.hmm - wget https://www.dfam.org/releases/Dfam_3.1/infrastructure/dfamscan/homo_sapiens_dfam.hmm.h3f - wget https://www.dfam.org/releases/Dfam_3.1/infrastructure/dfamscan/homo_sapiens_dfam.hmm.h3i - wget https://www.dfam.org/releases/Dfam_3.1/infrastructure/dfamscan/homo_sapiens_dfam.hmm.h3m - wget https://www.dfam.org/releases/Dfam_3.1/infrastructure/dfamscan/homo_sapiens_dfam.hmm.h3p - - export TMPDIR=/tmp - prep_genome_lib.pl \\ - --genome_fa ${fasta} \\ - --gtf ${gtf} \\ - --annot_filter_rule /opt/conda/envs/nf-core-rnafusion-star-fusion_1.8.1/ctat-genome-lib-builder-2830cd708c5bb9353878ca98069427e83acdd625/AnnotFilterRuleLib/AnnotFilterRule.pm \\ - --fusion_annot_lib CTAT_HumanFusionLib.dat.gz \\ - --pfam_db Pfam-A.hmm \\ - --dfam_db homo_sapiens_dfam.hmm \\ - --CPU ${task.cpus} - """ -} - -/* - * Completion - */ -workflow.onComplete { - log.info "[nf-core/rnafusion/build-ctat.nf] Pipeline Complete" -} - -def nfcoreHeader() { - // Log colors ANSI codes - c_black = params.monochrome_logs ? '' : "\033[0;30m"; - c_blue = params.monochrome_logs ? '' : "\033[0;34m"; - c_cyan = params.monochrome_logs ? '' : "\033[0;36m"; - c_dim = params.monochrome_logs ? '' : "\033[2m"; - c_green = params.monochrome_logs ? '' : "\033[0;32m"; - c_purple = params.monochrome_logs ? '' : "\033[0;35m"; - c_reset = params.monochrome_logs ? '' : "\033[0m"; - c_white = params.monochrome_logs ? '' : "\033[0;37m"; - c_yellow = params.monochrome_logs ? '' : "\033[0;33m"; - - return """ -${c_dim}--------------------------------------------------${c_reset}- - ${c_green},--.${c_black}/${c_green},-.${c_reset} - ${c_blue} ___ __ __ __ ___ ${c_green}/,-._.--~\'${c_reset} - ${c_blue} |\\ | |__ __ / ` / \\ |__) |__ ${c_yellow}} {${c_reset} - ${c_blue} | \\| | \\__, \\__/ | \\ |___ ${c_green}\\`-._,-`-,${c_reset} - ${c_green}`._,._,\'${c_reset} - ${c_purple} nf-core/rnafusion v${workflow.manifest.version}${c_reset} - -${c_dim}--------------------------------------------------${c_reset}- - """.stripIndent() -} - -def checkHostname() { - def c_reset = params.monochrome_logs ? '' : "\033[0m" - def c_white = params.monochrome_logs ? '' : "\033[0;37m" - def c_red = params.monochrome_logs ? '' : "\033[1;91m" - def c_yellow_bold = params.monochrome_logs ? '' : "\033[1;93m" - if (params.hostnames) { - def hostname = "hostname".execute().text.trim() - params.hostnames.each { prof, hnames -> - hnames.each { hname -> - if (hostname.contains(hname) && !workflow.profile.contains(prof)) { - log.error "====================================================\n" + - " ${c_red}WARNING!${c_reset} You are running with `-profile $workflow.profile`\n" + - " but your machine hostname is ${c_white}'$hostname'${c_reset}\n" + - " ${c_yellow_bold}It's highly recommended that you use `-profile $prof${c_reset}`\n" + - "============================================================" - } - } - } - } -} diff --git a/conf/base.config b/conf/base.config index 0081fde2..7d3fa627 100644 --- a/conf/base.config +++ b/conf/base.config @@ -1,73 +1,52 @@ /* - * ------------------------------------------------- - * nf-core/rnafusion Nextflow base config file - * ------------------------------------------------- - * A 'blank slate' config file, appropriate for general - * use on most high performace compute environments. - * Assumes that all software is installed and available - * on the PATH. Runs in `local` mode - all jobs will be - * run on the logged in environment. - */ - -params { - versions { - arriba = '1.2.0' - ericscript = '0.5.5' - fusioncatcher = '1.20' - pizzly = '0.37.3' - squid = '1.5-star2.7.1a' - star_fusion = '1.8.1' - } -} +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + nf-core/rnafusion Nextflow base config file +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + A 'blank slate' config file, appropriate for general use on most high performance + compute environments. Assumes that all software is installed and available on + the PATH. Runs in `local` mode - all jobs will be run on the logged in environment. +---------------------------------------------------------------------------------------- +*/ process { - cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 7.GB * task.attempt, 'memory' ) } - time = { check_max( 6.h * task.attempt, 'time' ) } - errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' } - maxRetries = 1 - maxErrors = '-1' + cpus = { check_max( 1 * task.attempt, 'cpus' ) } + memory = { check_max( 6.GB * task.attempt, 'memory' ) } + time = { check_max( 4.h * task.attempt, 'time' ) } - withLabel:process_low { - cpus = { check_max( 2 * task.attempt, 'cpus' ) } - memory = { check_max( 14.GB * task.attempt, 'memory' ) } - time = { check_max( 6.h * task.attempt, 'time' ) } - } - withLabel:process_medium { - cpus = { check_max( 6 * task.attempt, 'cpus' ) } - memory = { check_max( 42.GB * task.attempt, 'memory' ) } - time = { check_max( 12.h * task.attempt, 'time' ) } - } - withLabel:process_high { - cpus = { check_max( 12 * task.attempt, 'cpus' ) } - memory = { check_max( 84.GB * task.attempt, 'memory' ) } - time = { check_max( 24.h * task.attempt, 'time' ) } - } - withLabel:process_long { - time = { check_max( 48.h * task.attempt, 'time' ) } - } - withName:get_software_versions { - cache = false - } + errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' } + maxRetries = 1 + maxErrors = '-1' - // Fusion tools - withName:"arriba|arriba_visualization" { - container = "nfcore/rnafusion:arriba_${params.versions.arriba}" - } - withName:ericscript { - container = "nfcore/rnafusion:ericscript_${params.versions.ericscript}" - } - withName:fusioncatcher { - container = "nfcore/rnafusion:fusioncatcher_${params.versions.fusioncatcher}" - } - withName:pizzly { - container = "nfcore/rnafusion:pizzly_${params.versions.pizzly}" - } - withName:squid { - container = "nfcore/rnafusion:squid_${params.versions.squid}" - } - withName:"star_fusion|fusion_inspector" { - container = "nfcore/rnafusion:star-fusion_${params.versions.star_fusion}" - } -} \ No newline at end of file + withLabel:process_low { + cpus = { check_max( 2 * task.attempt, 'cpus' ) } + memory = { check_max( 12.GB * task.attempt, 'memory' ) } + time = { check_max( 4.h * task.attempt, 'time' ) } + } + withLabel:process_medium { + cpus = { check_max( 6 * task.attempt, 'cpus' ) } + memory = { check_max( 36.GB * task.attempt, 'memory' ) } + time = { check_max( 8.h * task.attempt, 'time' ) } + } + withLabel:process_high { + cpus = { check_max( 12 * task.attempt, 'cpus' ) } + memory = { check_max( 72.GB * task.attempt, 'memory' ) } + time = { check_max( 16.h * task.attempt, 'time' ) } + } + withLabel:process_long { + time = { check_max( 20.h * task.attempt, 'time' ) } + } + withLabel:process_high_memory { + memory = { check_max( 200.GB * task.attempt, 'memory' ) } + } + withLabel:error_ignore { + errorStrategy = 'ignore' + } + withLabel:error_retry { + errorStrategy = 'retry' + maxRetries = 2 + } + withName:CUSTOM_DUMPSOFTWAREVERSIONS { + cache = false + } +} diff --git a/conf/genomes.config b/conf/genomes.config index 985345cc..845445a2 100644 --- a/conf/genomes.config +++ b/conf/genomes.config @@ -2,7 +2,8 @@ * ------------------------------------------------- * Nextflow config file for reference genome * ------------------------------------------------- - * Defines reference genomes, without using iGenome paths + * These references have to be build manually due + * to ERCC spike-ins. * Can be used by any config that customizes the base * path using $params.genomes_base / --genomes_base */ @@ -10,14 +11,11 @@ params { genomes { 'GRCh38' { - fasta = "${params.genomes_base}/Homo_sapiens.GRCh38_r${params.reference_release}.all.fa" - gtf = "${params.genomes_base}/Homo_sapiens.GRCh38_r${params.reference_release}.gtf" - transcript = "${params.genomes_base}/Homo_sapiens.GRCh38_r${params.reference_release}.cdna.all.fa.gz" - databases = "${params.genomes_base}/databases" - arriba_ref = "${params.genomes_base}/arriba" - ericscript_ref = "${params.genomes_base}/ericscript/ericscript_db_homosapiens_ensembl84" - fusioncatcher_ref = "${params.genomes_base}/fusioncatcher/human_v98" - star_fusion_ref = "${params.genomes_base}/star-fusion/ctat_genome_lib_build_dir" + fasta = "${params.genomes_base}/ensembl/Homo_sapiens.${params.genome}.${params.ensembl_version}.all.fa" + gtf = "${params.genomes_base}/ensembl/Homo_sapiens.${params.genome}.${params.ensembl_version}.gtf" + chrgtf = "${params.genomes_base}/ensembl/Homo_sapiens.${params.genome}.${params.ensembl_version}.chr.gtf" + transcript = "${params.genomes_base}/ensembl/Homo_sapiens.${params.genome}.${params.ensembl_version}.cdna.all.fa.gz" + refflat = "${params.genomes_base}/ensembl/Homo_sapiens.${params.genome}.${params.ensembl_version}.chr.gtf.refflat" } } -} \ No newline at end of file +} diff --git a/conf/igenomes.config b/conf/igenomes.config new file mode 100644 index 00000000..7a1b3ac6 --- /dev/null +++ b/conf/igenomes.config @@ -0,0 +1,432 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for iGenomes paths +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Defines reference genomes using iGenome paths. + Can be used by any config that customises the base path using: + $params.igenomes_base / --igenomes_base +---------------------------------------------------------------------------------------- +*/ + +params { + // illumina iGenomes reference file paths + genomes { + 'GRCh37' { + fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "2.7e9" + blacklist = "${projectDir}/assets/blacklists/GRCh37-blacklist.bed" + } + 'GRCh38' { + fasta = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${projectDir}/assets/blacklists/hg38-blacklist.bed" + } + 'GRCm38' { + fasta = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "1.87e9" + blacklist = "${projectDir}/assets/blacklists/GRCm38-blacklist.bed" + } + 'TAIR10' { + fasta = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/README.txt" + mito_name = "Mt" + } + 'EB2' { + fasta = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/README.txt" + } + 'UMD3.1' { + fasta = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/README.txt" + mito_name = "MT" + } + 'WBcel235' { + fasta = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.bed" + mito_name = "MtDNA" + macs_gsize = "9e7" + } + 'CanFam3.1' { + fasta = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/README.txt" + mito_name = "MT" + } + 'GRCz10' { + fasta = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'BDGP6' { + fasta = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.bed" + mito_name = "M" + macs_gsize = "1.2e8" + } + 'EquCab2' { + fasta = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/README.txt" + mito_name = "MT" + } + 'EB1' { + fasta = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/README.txt" + } + 'Galgal4' { + fasta = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'Gm01' { + fasta = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/README.txt" + } + 'Mmul_1' { + fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/README.txt" + mito_name = "MT" + } + 'IRGSP-1.0' { + fasta = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.bed" + mito_name = "Mt" + } + 'CHIMP2.1.4' { + fasta = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/README.txt" + mito_name = "MT" + } + 'Rnor_5.0' { + fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'Rnor_6.0' { + fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'R64-1-1' { + fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.bed" + mito_name = "MT" + macs_gsize = "1.2e7" + } + 'EF2' { + fasta = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "1.21e7" + } + 'Sbi1' { + fasta = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/README.txt" + } + 'Sscrofa10.2' { + fasta = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/README.txt" + mito_name = "MT" + } + 'AGPv3' { + fasta = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.bed" + mito_name = "Mt" + } + 'hg38' { + fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${projectDir}/assets/blacklists/hg38-blacklist.bed" + } + 'hg19' { + fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${projectDir}/assets/blacklists/hg19-blacklist.bed" + } + 'mm10' { + fasta = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "1.87e9" + blacklist = "${projectDir}/assets/blacklists/mm10-blacklist.bed" + } + 'bosTau8' { + fasta = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.bed" + mito_name = "chrM" + } + 'ce10' { + fasta = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "9e7" + } + 'canFam3' { + fasta = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/README.txt" + mito_name = "chrM" + } + 'danRer10' { + fasta = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "1.37e9" + } + 'dm6' { + fasta = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "1.2e8" + } + 'equCab2' { + fasta = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/README.txt" + mito_name = "chrM" + } + 'galGal4' { + fasta = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/README.txt" + mito_name = "chrM" + } + 'panTro4' { + fasta = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/README.txt" + mito_name = "chrM" + } + 'rn6' { + fasta = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.bed" + mito_name = "chrM" + } + 'sacCer3' { + fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BismarkIndex/" + readme = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "1.2e7" + } + 'susScr3' { + fasta = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BWAIndex/version0.6.0/" + bowtie2 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/README.txt" + mito_name = "chrM" + } + } +} diff --git a/conf/modules.config b/conf/modules.config new file mode 100644 index 00000000..a5bbc2cb --- /dev/null +++ b/conf/modules.config @@ -0,0 +1,299 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Config file for defining DSL2 per module options and publishing paths +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Available keys to override module options: + ext.args = Additional arguments appended to command in module. + ext.args2 = Second set of arguments appended to command in module (multi-tool modules). + ext.args3 = Third set of arguments appended to command in module (multi-tool modules). + ext.prefix = File name prefix for output files. +---------------------------------------------------------------------------------------- +*/ + +process { + + publishDir = [ + path: { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + + withName: ARRIBA { + publishDir = [ + path: { "${params.outdir}/arriba" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + ext.prefix = { "${meta.id}.arriba" } + } + + withName: ARRIBA_DOWNLOAD { + publishDir = [ + path: { "${params.genomes_base}/arriba" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: ARRIBA_VISUALISATION { + publishDir = [ + path: { "${params.outdir}/arriba_visualisation" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + ext.args = "cytobands_hg38_GRCh38_v2.1.0.tsv" + ext.args2 = "protein_domains_hg38_GRCh38_v2.1.0.gff3" + } + + withName: CUSTOM_DUMPSOFTWAREVERSIONS { + publishDir = [ + path: { "${params.outdir}/pipeline_info" }, + mode: params.publish_dir_mode, + pattern: '*_versions.yml' + ] + } + + withName: ENSEMBL_DOWNLOAD { + publishDir = [ + path: { "${params.genomes_base}/ensembl" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: FASTQC { + ext.args = '--quiet' + } + + witName: FUSIONCATCHER { + cpus = { check_max( 24 * task.attempt, 'cpus' ) } + memory = { check_max( 72.GB * task.attempt, 'memory' ) } + time = { check_max( 48.h * task.attempt, 'time' ) } + } + + withName: FUSIONCATCHER_DOWNLOAD { + publishDir = [ + path: { "${params.genomes_base}/fusioncatcher" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: FUSIONINSPECTOR { + ext.when = { !params.skip_vis } + } + + withName: FUSIONREPORT { + ext.when = { !params.skip_vis } + publishDir = [ + path: { "${params.outdir}/fusionreport/${meta.id}" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: FUSIONREPORT_DOWNLOAD { + publishDir = [ + path: { "${params.genomes_base}/fusion_report_db" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: GTF_TO_REFFLAT { + publishDir = [ + path: { "${params.genomes_base}/ensembl" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, + ] + } + + withName: KALLISTO_INDEX { + ext.args = '-k 31' + publishDir = [ + path: { "${params.genomes_base}/pizzly" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, + ] + } + + withName: PICARD_COLLECTRNASEQMETRICS { + ext.when = { !params.skip_qc && (params.starfusion || params.all) } + + } + + withName: PICARD_MARKDUPLICATES { + ext.when = { !params.skip_qc && (params.starfusion || params.all) } + } + + withName: PIZZLY { + ext.args = "-k 31 --align-score 2 --insert-size 400 --cache index.cache.txt" + publishDir = [ + path: { "${params.outdir}/pizzly" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, + ] + } + + withName: QUALIMAP_RNASEQ { + ext.when = { !params.skip_qc && (params.starfusion || params.all)} + } + + withName: SAMPLESHEET_CHECK { + publishDir = [ + path: { "${params.outdir}/pipeline_info" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: SAMTOOLS_INDEX_FOR_ARRIBA { + publishDir = [ + path: { "${params.outdir}/samtools_index_for_arriba" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: SAMTOOLS_INDEX_FOR_QC { + ext.when = { !params.skip_qc && (params.starfusion || params.all)} + publishDir = [ + path: { "${params.outdir}/samtools_index_for_qc" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: SAMTOOLS_INDEX_FOR_SQUID { + publishDir = [ + path: { "${params.outdir}/samtools_index_for_squid" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: SAMTOOLS_SORT_FOR_ARRIBA { + ext.prefix = { "${meta.id}_sorted" } + publishDir = [ + path: { "${params.outdir}/samtools_sort_for_arriba" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: SAMTOOLS_SORT_FOR_SQUID { + ext.prefix = { "${meta.id}_chimeric_sorted" } + publishDir = [ + path: { "${params.outdir}/samtools_sort_for_squid" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: SAMTOOLS_VIEW_FOR_SQUID { + ext.args = { "-Sb -o ${meta.id}_chimeric.bam" } + } + + withName: STAR_FOR_ARRIBA { + publishDir = [ + path: { "${params.outdir}/star_for_arriba" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, + ] + ext.args = '--readFilesCommand zcat \ + --outSAMtype BAM Unsorted \ + --outSAMunmapped Within \ + --outBAMcompression 0 \ + --outFilterMultimapNmax 50 \ + --peOverlapNbasesMin 10 \ + --alignSplicedMateMapLminOverLmate 0.5 \ + --alignSJstitchMismatchNmax 5 -1 5 5 \ + --chimSegmentMin 10 \ + --chimOutType WithinBAM HardClip \ + --chimJunctionOverhangMin 10 \ + --chimScoreDropMax 30 \ + --chimScoreJunctionNonGTAG 0 \ + --chimScoreSeparation 1 \ + --chimSegmentReadGapMax 3 \ + --chimMultimapNmax 50' + } + + withName: STAR_FOR_SQUID { + publishDir = [ + path: { "${params.outdir}/star_for_squid" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, + ] + ext.args = '--twopassMode Basic \ + --chimOutType SeparateSAMold \ + --chimSegmentMin 20 \ + --chimJunctionOverhangMin 12 \ + --alignSJDBoverhangMin 10 \ + --outReadsUnmapped Fastx \ + --outSAMstrandField intronMotif \ + --outSAMtype BAM SortedByCoordinate \ + --readFilesCommand zcat' + } + + withName: STAR_FOR_STARFUSION { + publishDir = [ + path: { "${params.outdir}/star_for_starfusion" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, + ] + ext.args = '--twopassMode Basic \ + --outReadsUnmapped None \ + --readFilesCommand zcat \ + --outSAMtype BAM SortedByCoordinate \ + --outSAMstrandField intronMotif \ + --outSAMunmapped Within \ + --chimSegmentMin 12 \ + --chimJunctionOverhangMin 8 \ + --chimOutJunctionFormat 1 \ + --alignSJDBoverhangMin 10 \ + --alignMatesGapMax 100000 \ + --alignIntronMax 100000 \ + --alignSJstitchMismatchNmax 5 -1 5 5 \ + --chimMultimapScoreRange 3 \ + --chimScoreJunctionNonGTAG -4 \ + --chimMultimapNmax 20 \ + --chimNonchimScoreDropMin 10 \ + --peOverlapNbasesMin 12 \ + --peOverlapMMp 0.1 \ + --alignInsertionFlush Right \ + --alignSplicedMateMapLminOverLmate 0 \ + --alignSplicedMateMapLmin 30 \ + --chimOutType Junctions' + } + + withName: STAR_GENOMEGENERATE { + ext.args = "--sjdbOverhang ${params.read_length - 1}" + publishDir = [ + path: { "${params.genomes_base}" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, + ] + } + + withName: STARFUSION_BUILD { + cpus = { check_max( 24 * task.attempt, 'cpus' ) } + memory = { check_max( 100.GB * task.attempt, 'memory' ) } + time = { check_max( 2.d * task.attempt, 'time' ) } + publishDir = [ + path: { "${params.genomes_base}/starfusion" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, + ] + } + + withName: STARFUSION_DOWNLOAD { + cpus = { check_max( 2 * task.attempt, 'cpus' ) } + memory = { check_max( 24.GB * task.attempt, 'memory' ) } + time = { check_max( 6.h * task.attempt, 'time' ) } + publishDir = [ + path: { "${params.genomes_base}/starfusion" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, + ] + } +} diff --git a/conf/test.config b/conf/test.config index 45cd99c8..35910107 100644 --- a/conf/test.config +++ b/conf/test.config @@ -1 +1,25 @@ -params.help = true \ No newline at end of file +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for running minimal tests +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Defines input files and everything required to run a fast and simple pipeline test. + + Use as follows: + nextflow run nf-core/rnafusion -profile test, --outdir + +---------------------------------------------------------------------------------------- +*/ + +params { + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' + + // Limit resources so that this can run on GitHub Actions + max_cpus = 2 + max_memory = 6.GB + max_time = 6.h + + // Input data + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/rnafusion/testdata/human/samplesheet_valid.csv' + +} diff --git a/conf/test_full.config b/conf/test_full.config new file mode 100644 index 00000000..5525cad2 --- /dev/null +++ b/conf/test_full.config @@ -0,0 +1,23 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for running full-size tests +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Defines input files and everything required to run a full size pipeline test. + + Use as follows: + nextflow run nf-core/rnafusion -profile test_full, --outdir + +---------------------------------------------------------------------------------------- +*/ + +params { + config_profile_name = 'Full test profile' + config_profile_description = 'Full test dataset to check pipeline function' + + // Input data + genome = 'GRCh38' + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/rnafusion/testdata/human/samplesheet_valid.csv' + genomes_base = "${params.outdir}/references" + all = true + +} diff --git a/conf/test_full_build.config b/conf/test_full_build.config new file mode 100644 index 00000000..6243a1e1 --- /dev/null +++ b/conf/test_full_build.config @@ -0,0 +1,25 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for running full-size tests +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Defines input files and everything required to run a full size pipeline test. + + Use as follows: + nextflow run nf-core/rnafusion -profile test_full_build, --outdir + +---------------------------------------------------------------------------------------- +*/ + +params { + config_profile_name = 'Full build test profile' + config_profile_description = 'Full build test dataset to check pipeline function' + + + // Input data + genome = 'GRCh38' + build_references = true + all = true + genomes_base = "${params.outdir}/references" + cosmic_username = "${{ secrets.cosmic_username }}" + cosmic_passwd = "${{ secrets.cosmic_passwd }}" +} diff --git a/containers/arriba/Dockerfile b/containers/arriba/Dockerfile index f6a94950..ea36d90f 100644 --- a/containers/arriba/Dockerfile +++ b/containers/arriba/Dockerfile @@ -11,4 +11,4 @@ RUN conda env create -f /environment.yml && conda clean -a ENV PATH /opt/conda/envs/nf-core-rnafusion-arriba_1.2.0/bin:$PATH # Dump the details of the installed packages to a file for posterity -RUN conda env export --name nf-core-rnafusion-arriba_1.2.0 > nf-core-rnafusion-arriba_1.2.0.yml \ No newline at end of file +RUN conda env export --name nf-core-rnafusion-arriba_1.2.0 > nf-core-rnafusion-arriba_1.2.0.yml diff --git a/containers/arriba/environment.yml b/containers/arriba/environment.yml index 393b2e11..20a1024e 100644 --- a/containers/arriba/environment.yml +++ b/containers/arriba/environment.yml @@ -11,4 +11,4 @@ dependencies: - bioconda::star=2.7.1a - conda-forge::openssl=1.0 - conda-forge::r-circlize - - conda-forge::readline=6.2 \ No newline at end of file + - conda-forge::readline=6.2 diff --git a/containers/ericscript/Dockerfile b/containers/ericscript/Dockerfile index d418d1c7..70b87c48 100644 --- a/containers/ericscript/Dockerfile +++ b/containers/ericscript/Dockerfile @@ -14,4 +14,4 @@ ENV PATH /opt/conda/envs/nf-core-rnafusion-ericscript_0.5.5/bin:$PATH RUN echo 1 > /opt/conda/envs/nf-core-rnafusion-ericscript_0.5.5/share/ericscript-0.5.5-4/lib/data/_resources/.flag.dbexists # Dump the details of the installed packages to a file for posterity -RUN conda env export --name nf-core-rnafusion-ericscript_0.5.5 > nf-core-rnafusion-ericscript_0.5.5.yml \ No newline at end of file +RUN conda env export --name nf-core-rnafusion-ericscript_0.5.5 > nf-core-rnafusion-ericscript_0.5.5.yml diff --git a/containers/ericscript/environment.yml b/containers/ericscript/environment.yml index 3edc48ce..1e76273c 100644 --- a/containers/ericscript/environment.yml +++ b/containers/ericscript/environment.yml @@ -5,4 +5,4 @@ channels: - defaults dependencies: - bioconda::ericscript=0.5.5 - - conda-forge::ncurses=6.1 \ No newline at end of file + - conda-forge::ncurses=6.1 diff --git a/containers/fusioncatcher/Dockerfile b/containers/fusioncatcher/Dockerfile index 203bb32a..77efaf57 100644 --- a/containers/fusioncatcher/Dockerfile +++ b/containers/fusioncatcher/Dockerfile @@ -1,14 +1,49 @@ -FROM nfcore/base:1.7 +FROM ubuntu:18.04 -LABEL authors="Martin Proks" \ - description="Docker image containing all requirements for nfcore/rnafusion pipeline" +LABEL Description="This image is used to run FusionCatcher" Version="1.33" -# Install the conda environment -COPY environment.yml / -RUN conda env create -f /environment.yml && conda clean -a +RUN apt-get -y clean \ + && apt-get -y update \ + && apt-get -y install \ + automake \ + build-essential \ + bzip2 \ + cmake \ + curl \ + g++ \ + gawk \ + gcc \ + gzip \ + libc6-dev \ + libncurses5-dev \ + libtbb2 \ + libtbb-dev \ + make \ + parallel \ + pigz \ + python \ + python-dev \ + python-biopython \ + python-numpy \ + python-openpyxl \ + python-xlrd \ + tar \ + unzip \ + wget \ + zip \ + zlib1g \ + zlib1g-dev \ + zlibc \ + default-jdk \ + && apt-get -y clean -# Add conda installation dir to PATH (instead of doing 'conda activate') -ENV PATH /opt/conda/envs/nf-core-rnafusion-fusioncatcher_1.20/bin:$PATH +WORKDIR /opt -# Dump the details of the installed packages to a file for posterity -RUN conda env export --name nf-core-rnafusion-fusioncatcher_1.20 > nf-core-rnafusion-fusioncatcher_1.20.yml \ No newline at end of file +###################### +## INSTALLATION +###################### + +RUN wget --no-check-certificate http://sf.net/projects/fusioncatcher/files/bootstrap.py -O bootstrap.py \ + && python bootstrap.py -t -y -i /opt/fusioncatcher/v1.33/ + +ENV PATH /opt/fusioncatcher/v1.33/bin:$PATH diff --git a/containers/fusioncatcher/environment.yml b/containers/fusioncatcher/environment.yml deleted file mode 100644 index 1d6a105d..00000000 --- a/containers/fusioncatcher/environment.yml +++ /dev/null @@ -1,7 +0,0 @@ -name: nf-core-rnafusion-fusioncatcher_1.20 -channels: - - conda-forge - - bioconda - - defaults -dependencies: - - bioconda::fusioncatcher=1.20 diff --git a/containers/pizzly/Dockerfile b/containers/pizzly/Dockerfile index 2817069f..0056bf9b 100644 --- a/containers/pizzly/Dockerfile +++ b/containers/pizzly/Dockerfile @@ -11,4 +11,4 @@ RUN conda env create -f /environment.yml && conda clean -a ENV PATH /opt/conda/envs/nf-core-rnafusion-pizzly_0.37.3/bin:$PATH # Dump the details of the installed packages to a file for posterity -RUN conda env export --name nf-core-rnafusion-pizzly_0.37.3 > nf-core-rnafusion-pizzly_0.37.3.yml \ No newline at end of file +RUN conda env export --name nf-core-rnafusion-pizzly_0.37.3 > nf-core-rnafusion-pizzly_0.37.3.yml diff --git a/containers/squid/Dockerfile b/containers/squid/Dockerfile index 5dbcaf4c..20d390ee 100644 --- a/containers/squid/Dockerfile +++ b/containers/squid/Dockerfile @@ -16,4 +16,4 @@ RUN cd /opt/conda/envs/nf-core-rnafusion-squid_1.5-star2.7.1a/bin \ && ln -s /opt/conda/envs/nf-core-rnafusion-squid_1.5-star2.7.1a/bin/python3 /bin/python # Dump the details of the installed packages to a file for posterity -RUN conda env export --name nf-core-rnafusion-squid_1.5-star2.7.1a > nf-core-rnafusion-squid_1.5-star2.7.1a.yml \ No newline at end of file +RUN conda env export --name nf-core-rnafusion-squid_1.5-star2.7.1a > nf-core-rnafusion-squid_1.5-star2.7.1a.yml diff --git a/containers/star-fusion/Dockerfile b/containers/star-fusion/Dockerfile deleted file mode 100644 index 79c6fdbc..00000000 --- a/containers/star-fusion/Dockerfile +++ /dev/null @@ -1,20 +0,0 @@ -FROM nfcore/base:1.9 - -LABEL authors="Martin Proks" \ - description="Docker image containing all requirements for nfcore/rnafusion pipeline" - -# Install the conda environment -COPY environment.yml / -RUN conda env create -f /environment.yml && conda clean -a - -# Add conda installation dir to PATH (instead of doing 'conda activate') -ENV PATH /opt/conda/envs/nf-core-rnafusion-star-fusion_1.8.1/bin:$PATH - -# FusionInspector -ENV PATH /opt/conda/envs/nf-core-rnafusion-star-fusion_1.8.1/lib/STAR-Fusion/FusionInspector:$PATH - -# ctat-genome-lib-builder -ENV PATH /opt/conda/envs/nf-core-rnafusion-star-fusion_1.8.1/lib/STAR-Fusion/ctat-genome-lib-builder:$PATH - -# Dump the details of the installed packages to a file for posterity -RUN conda env export --name nf-core-rnafusion-star-fusion_1.8.1 > nf-core-rnafusion-star-fusion_1.8.1.yml \ No newline at end of file diff --git a/containers/star-fusion/environment.yml b/containers/star-fusion/environment.yml deleted file mode 100644 index 4ca4be91..00000000 --- a/containers/star-fusion/environment.yml +++ /dev/null @@ -1,12 +0,0 @@ -name: nf-core-rnafusion-star-fusion_1.8.1 -channels: - - conda-forge - - bioconda - - defaults -dependencies: - - bioconda::dfam=2.0 - - bioconda::hmmer=3.2.1 - - bioconda::star-fusion=1.8.1 - - bioconda::trinity=2.6.6 - - bioconda::samtools=1.9 - - conda-forge::perl-carp-assert \ No newline at end of file diff --git a/docs/README.md b/docs/README.md index 4c584b9a..0c1d417d 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,12 +1,10 @@ # nf-core/rnafusion: Documentation -The nf-core/rnafusion documentation is split into the following files: +The nf-core/rnafusion documentation is split into the following pages: -1. [Installation](https://nf-co.re/usage/installation) -2. Pipeline configuration - * [Download references](references.md) - * [Local installation](https://nf-co.re/usage/local_installation) - * [Adding your own system config](https://nf-co.re/usage/adding_own_config) -3. [Running the pipeline](usage.md) -4. [Output and how to interpret the results](output.md) -5. [Troubleshooting](https://nf-co.re/usage/troubleshooting) +- [Usage](usage.md) + - An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. +- [Output](output.md) + - An overview of the different results produced by the pipeline and how to interpret them. + +You can find a lot more documentation about installing, configuring and running nf-core pipelines on the website: [https://nf-co.re](https://nf-co.re) diff --git a/docs/images/mqc_fastqc_adapter.png b/docs/images/mqc_fastqc_adapter.png new file mode 100755 index 00000000..361d0e47 Binary files /dev/null and b/docs/images/mqc_fastqc_adapter.png differ diff --git a/docs/images/mqc_fastqc_counts.png b/docs/images/mqc_fastqc_counts.png new file mode 100755 index 00000000..cb39ebb8 Binary files /dev/null and b/docs/images/mqc_fastqc_counts.png differ diff --git a/docs/images/mqc_fastqc_quality.png b/docs/images/mqc_fastqc_quality.png new file mode 100755 index 00000000..a4b89bf5 Binary files /dev/null and b/docs/images/mqc_fastqc_quality.png differ diff --git a/docs/images/nf-core-rnafusion_logo.png b/docs/images/nf-core-rnafusion_logo.png deleted file mode 100644 index eab61c0e..00000000 Binary files a/docs/images/nf-core-rnafusion_logo.png and /dev/null differ diff --git a/docs/images/nf-core-rnafusion_logo_dark.png b/docs/images/nf-core-rnafusion_logo_dark.png new file mode 100644 index 00000000..598d1e22 Binary files /dev/null and b/docs/images/nf-core-rnafusion_logo_dark.png differ diff --git a/docs/images/nf-core-rnafusion_logo_light.png b/docs/images/nf-core-rnafusion_logo_light.png new file mode 100644 index 00000000..5609a199 Binary files /dev/null and b/docs/images/nf-core-rnafusion_logo_light.png differ diff --git a/docs/output.md b/docs/output.md index 436ef45d..b9d64d87 100644 --- a/docs/output.md +++ b/docs/output.md @@ -1,258 +1,405 @@ - # nf-core/rnafusion: Output -This document describes the output produced by the pipeline. - - -## Pipeline overview - -The pipeline is built using [Nextflow](https://www.nextflow.io/) -and processes data using the following steps: - -- [Arriba](#arriba) -- [EricScript](#ericscript) -- [FastQC](#fastqc) -- [Fusioncatcher](#fusioncatcher) -- [Fusion Inspector](#fusion-inspector) -- [fusion-report](#fusion-report) - - [Tool detection](#tool-detection) - - [Found in database](#found-in-database) - - [Tool detection distribution](#tool-detection-distribution) -- [MultiQC](#multiqc) -- [Pizzly](#pizzly) -- [Squid](#squid) -- [Star-Fusion](#star-fusion) - -## Arriba - -**Output directory: `results/tools/Arriba`** - -- `fusions.tsv` - - contains fusions which pass all of Arriba's filters. It should be highly enriched for true predictions. The predictions are listed from highest to lowest confidence. -- `fusions.discarded.tsv` - - contains all events that Arriba classified as an artifact or that are also observed in healthy tissue. This file may be useful, if one suspects that an event should be present, but was erroneously discarded by Arriba. -- `.pdf` - - contains fusion visualization when opted for `--arriba_vis` - -## EricScript - -**Output directory: `results/tools/Ericscript/tmp`** - -- `fusions.results.filtered.tsv` - - contains all the predicted gene fusions - -|  Column | Description | -| ------- | ----------- | -| GeneName1 | official gene name of 5' gene. | -| GeneName2 | official gene name of 3' gene. | -| chr1 | chromosome of 5' gene. | -| Breakpoint1 | predicted breakpoint on 5' gene. | -| strand1 | strand (-/+) of 5' gene. | -| chr2 | chromosome of 3' gene. | -| Breakpoint2 | predicted breakpoint on 3' gene. | -| strand2 | strand (-/+) of 3' gene. | -| EnsemblGene1 | Ensembl gene ID of 5' gene. | -| EnsemblGene2 | Ensembl gene ID of 3' gene. | -| crossingreads | the number of paired end discordant reads. | -| spanningreads | the number of paired end reads spanning the junction. | -| mean.insertsize | mean of insert sizes of crossing + spanning reads. | -| homology | if filled, all the homologies between the fusion junction and Ensembl genes. | -| fusiontype | intra-chromosomal, inter-chromosomal, read-through or CIS. | -| InfoGene1 | gene information about 5' gene. | -| InfoGene2 | gene information about 3' gene. | -| JunctionSequence | predicted junction fusion sequence. | -| GeneExpr1 | Read count based estimation of the expression level of 5' gene. | -| GeneExpr2 | Read count based estimation of the expression level of 3' gene. | -| GeneExpr_fused | Read count based estimation of the expression level of the predicted chimeric transcript. | -| ES | Edge score. | -| GJS | Genuine Junction score. | -| US | Uniformity score. | -| EricScore | EricScore score (adaboost classifier). | - -For more info check the [documentation](https://sites.google.com/site/bioericscript/getting-started). - -## FastQC - -[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%T/A/G/C). You get information about adapter contamination and other overrepresented sequences. - -For further reading and documentation see the [FastQC help](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/). - -> **NB:** The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality. To see how your reads look after trimming, look at the FastQC reports in the `trim_galore` directory. - -**Output directory: `results/fastqc`** - -- `sample_fastqc.html` - - FastQC report, containing quality metrics for your untrimmed raw fastq files -- `zips/sample_fastqc.zip` - - zip file containing the FastQC report, tab-delimited data file and plot images - -## Fusioncatcher - -**Output directory: `results/tools/Fusioncatcher`** +## Introduction + +This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline. + +The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory. + +## Pipeline overview + +The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: + +- [Download and build references](#references) - Build references needed to run the rest of the pipeline +- [STAR](#star) - Alignment for arriba, squid and STAR-fusion +- [Cat](#cat) - Concatenated fastq files per sample ID +- [Arriba](#arriba) - Arriba fusion detection +- [Pizzly](#pizzly) - Pizzly fusion detection +- [Squid](#squid) - Squid fusion detection +- [STAR-fusion](#starfusion) - STAR-fusion fusion detection +- [FusionCatcher](#fusioncatcher) - Fusion catcher fusion detection +- [Samtools](#samtools) - SAM/BAM file manipulation +- [Arriba visualisation](#arriba-visualisation) - Arriba visualisation report +- [Fusion-report](#fusion-report) - Summary of the findings of each tool and comparison to COSMIC, Mitelman and FusionGBD databases +- [FusionInspector](#fusionInspector) - IGV-based visualisation tool for fusions filtered by fusion-report +- [Qualimap](#qualimap) - Quality control of alignment +- [Picard](#picard) - Collect metrics +- [FastQC](#fastqc) - Raw read quality control +- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline +- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution + +### Download and build references + +
+Output files + +- `genomes_base/` + - `arriba` + - `blacklist_hg38_GRCh38_v2.1.0.tsv.gz` + - `protein_domains_hg38_GRCh38_v2.1.0.gff3` + - `cytobands_hg38_GRCh38_v2.1.0.tsv` + - `ensembl` + - `Homo_sapiens.GRCh38.{ensembl_version}.all.fa` + - `Homo_sapiens.GRCh38.{ensembl_version}.cdna.all.fa.gz` + - `Homo_sapiens.GRCh38.{ensembl_version}.gtf` + - `Homo_sapiens.GRCh38.{ensembl_version}.chr.gtf` + - `Homo_sapiens.GRCh38.{ensembl_version}.chr.gtf.refflat` + - `fusioncatcher` + - `human_v` - dir with all references for fusioncatcher + - `fusion_report_db` + - `cosmic.db` + - `fusiongdb.db` + - `fusiongdb2.db` + - `mitelman.db` + - `pizzly` + - `kallisto` - file containing the kallisto index + - `star` - dir with STAR index + - `starfusion` + - files and dirs used to build the index + - `ctat_genome_lib_build_dir` - dir containing the index + +(Only files or folders used by the pipeline are mentioned explicitly.) + +
+ +### STAR + +STAR is used to align to genome reference + +STAR is run 3 times: + +For arriba with the parameters: + +```bash +--readFilesCommand zcat \ +--outSAMtype BAM Unsorted \ +--outSAMunmapped Within \ +--outBAMcompression 0 \ +--outFilterMultimapNmax 50 \ +--peOverlapNbasesMin 10 \ +--alignSplicedMateMapLminOverLmate 0.5 \ +--alignSJstitchMismatchNmax 5 -1 5 5 \ +--chimSegmentMin 10 \ +--chimOutType WithinBAM HardClip \ +--chimJunctionOverhangMin 10 \ +--chimScoreDropMax 30 \ +--chimScoreJunctionNonGTAG 0 \ +--chimScoreSeparation 1 \ +--chimSegmentReadGapMax 3 \ +--chimMultimapNmax 50 +``` + +For squid with the parameters: + +```bash +--twopassMode Basic \ +--chimOutType SeparateSAMold \ +--chimSegmentMin 20 \ +--chimJunctionOverhangMin 12 \ +--alignSJDBoverhangMin 10 \ +--outReadsUnmapped Fastx \ +--outSAMstrandField intronMotif \ +--outSAMtype BAM SortedByCoordinate \ +--readFilesCommand zcat +``` + +For STAR-fusion with the parameters: + +```bash +--twopassMode Basic \ +--outReadsUnmapped None \ +--readFilesCommand zcat \ +--outSAMstrandField intronMotif \ +--outSAMunmapped Within \ +--chimSegmentMin 12 \ +--chimJunctionOverhangMin 8 \ +--chimOutJunctionFormat 1 \ +--alignSJDBoverhangMin 10 \ +--alignMatesGapMax 100000 \ +--alignIntronMax 100000 \ +--alignSJstitchMismatchNmax 5 -1 5 5 \ +--chimMultimapScoreRange 3 \ +--chimScoreJunctionNonGTAG -4 \ +--chimMultimapNmax 20 \ +--chimNonchimScoreDropMin 10 \ +--peOverlapNbasesMin 12 \ +--peOverlapMMp 0.1 \ +--alignInsertionFlush Right \ +--alignSplicedMateMapLminOverLmate 0 \ +--alignSplicedMateMapLmin 30 \ +--chimOutType Junctions +``` + +> STAR_FOR_STARFUSION uses `${params.ensembl_ref}/Homo_sapiens.GRCh38.${params.ensembl_version}.chr.gtf` whereas STAR_FOR_ARRIBA and STAR_FOR_SQUID use `${params.ensembl_ref}/Homo_sapiens.GRCh38.${params.ensembl_version}.gtf` + +
+Output files + +- `star_for_` +_ **Common** +_ `.Log.final.out` +_ `.Log.progress.out` +_ `.SJ.out.tab` +_ **For arriba:** +_ `.Aligned.out.bam` +_ **For squid:** +_ `.Aligned.sortedByCoord.out.bam` +_ `.Chimeric.out.sam` +_ `.unmapped_1.fastq.gz` +_ `.unmapped_2.fastq.gz` +_ **For starfusion:** +_ `.Aligned.sortedByCoord.out.bam` +_ `.Chimeric.out.junction` +
-- `final-list_candidate-fusion-genes.txt` - - contains all the predicted gene fusions +### Cat -|  Column | Description | -| ------- | ----------- | -| **Gene\_1\_symbol(5end\_fusion\_partner)** | Gene symbol of the 5' end fusion partner | -| **Gene\_2\_symbol\_2(3end\_fusion\_partner)** | Gene symbol of the 3' end fusion partner | -| **Gene\_1\_id(5end\_fusion\_partner)** | Ensembl gene id of the 5' end fusion partner | -| **Gene\_2\_id(3end\_fusion\_partner)** | Ensembl gene id of the 3' end fusion partner | -| **Exon\_1\_id(5end\_fusion\_partner)** | Ensembl exon id of the 5' end fusion exon-exon junction | -| **Exon\_2\_id(3end\_fusion\_partner)** | Ensembl exon id of the 3' end fusion exon-exon junction | -| **Fusion\_point\_for\_gene\_1(5end\_fusion\_partner)** | Chromosomal position of the 5' end of fusion junction (chromosome:position:strand); 1-based coordinate | -| **Fusion\_point\_for\_gene\_2(3end\_fusion\_partner)** | Chromosomal position of the 3' end of fusion junction (chromosome:position:strand); 1-based coordinate | -| **Spanning\_pairs** | Count of pairs of reads supporting the fusion (**including** also the multimapping reads) | -| **Spanning\_unique\_reads** | Count of unique reads (i.e. unique mapping positions) mapping on the fusion junction. Shortly, here are counted all the reads which map on fusion junction minus the PCR duplicated reads. | -| **Longest\_anchor\_found** | Longest anchor (hangover) found among the unique reads mapping on the fusion junction | -| **Fusion\_finding\_method** | Aligning method used for mapping the reads and finding the fusion genes. Here are two methods used which are: (i) **BOWTIE** = only Bowtie aligner is used for mapping the reads on the genome and exon-exon fusion junctions, (ii) **BOWTIE+BLAT** = Bowtie aligner is used for mapping reads on the genome and BLAT is used for mapping reads for finding the fusion junction, (iii) **BOWTIE+STAR** = Bowtie aligner is used for mapping reads on the genome and STAR is used for mapping reads for finding the fusion junction, (iv) **BOWTIE+BOWTIE2** = Bowtie aligner is used for mapping reads on the genome and Bowtie2 is used for mapping reads for finding the fusion junction. | -| **Fusion\_sequence** | The inferred fusion junction (the asterisk sign marks the junction point) | -| **Fusion\_description** | Type of the fusion gene (see the Table 2) | -| **Counts\_of\_common\_mapping\_reads** | Count of reads mapping simultaneously on both genes which form the fusion gene. This is an indication how similar are the DNA/RNA sequences of the genes forming the fusion gene (i.e. what is their homology because highly homologous genes tend to appear show as candidate fusion genes). In case of completely different sequences of the genes involved in forming a fusion gene then here it is expected to have the value zero. | -| **Predicted\_effect** | Predicted effect of the candidate fusion gene using the annotation from Ensembl database. This is shown in format **effect\_gene\_1**/**effect\_gene\_2**, where the possible values for effect\_gene\_1 or effect\_gene\_2 are: **intergenic**, **intronic**, **exonic(no-known-CDS)**, **UTR**, **CDS(not-reliable-start-or-end)**, **CDS(truncated)**, or **CDS(complete)**. In case that the fusion junction for both genes is within their CDS (coding sequence) then only the values **in-frame** or **out-of-frame** will be shown. | -| **Predicted\_fused\_transcripts** | All possible known fused transcripts in format ENSEMBL-TRANSCRIPT-1:POSITION-1/ENSEMBLE-TRANSCRIPT-B:POSITION-2, where are fused the sequence 1:POSITION-1 of transcript ENSEMBL-TRANSCRIPT-1 with sequence POSITION-2:END of transcript ENSEMBL-TRANSCRIPT-2 | -| **Predicted\_fused\_proteins** | Predicted amino acid sequences of all possible fused proteins (separated by ";"). | +Cat is used to concatenate fastq files belonging to the same sample. -For more info check the [documentation](https://github.com/ndaniel/fusioncatcher/blob/master/doc/manual.md#62---output-data-output-data). +
+Output files -## Fusion Inspector +- `cat` + - `_1.merged.fastq.gz` + - `_2.merged.fastq.gz` -**Output directory: `results/tools/FusionInspector`** +
-- `finspector.fa` - - the candidate fusion-gene contigs (if you copy things elsewhere, make sure to also copy the index file: `finspector.fa.fai`) -- `finspector.bed` - - the reference gene structure annotations for fusion partners -- `finspector.junction_reads.bam` - - alignments of the breakpoint-junction supporting reads. -- `finspector.spanning_reads.bam` - - alignments of the breakpoint-spanning paired-end reads. - -To visualize fusion genes in [IGV tool](https://software.broadinstitute.org/software/igv/igvtools) first create a genome `Menu->Genomes->Create .genome File`, choose name and description, then choose the following files: +### Arriba -- `finspector.fa` - - make sure the index file finspector.fa.fai is in the same folder -- `finspector.gtf` - - use this for 'Genes' -- `cytoBand.txt` - - use this for 'optional Cytoband' +Arriba is used for i) detect fusion and ii) output a PDF report for the fusions found (visualisation): -Add the bam files by choosing `File->Load from File` and make sure to select your generated mini genome in the upper-left corner. -For more info and help check [wiki page](https://github.com/FusionInspector/FusionInspector/wiki). +#### Detection -## fusion-report - -**Output directory: `results/Report-`** - -- `fusions.json` - - contains all main information about found fusions (fusion name, score, explanation of the score calculation, cherry picked output from fusion tools) -- `index.html` - - main dashboard containing the list of all detected fusions -- `*.html` - - each fusion gets a custom page with fetched data from the local database -- `fusions_list_filtered.txt` - - filtered list of found fusions (uses tool cutoff as filter, by default: 2, can be adjusted by adding `-t ` when running the tool) -- `fusions_list.txt` - - unfiltered list of found fusions - -### Tool detection +
+Output files -Graphs displaying ratio of fusion genes caught by different tools. The last part *all tools* is an intersection of all tools. +- `arriba` + - `.arriba.fusions.tsv` - contains the identified fusions + - `.arriba.fusions.discarded.tsv` -![Tool detection](images/summary_graph_1.png) +
-### Found in database +#### Visualisation -Displays how many fusions were found in a downloaded databases of the summary report. +
+Output files -![Known/unknown fusions](images/summary_graph_2.png) +- `arriba_visualisation` + - `.pdf` -### Tool detection distribution +
-For each fusion a sum of detected tools is calculated. This counts are then visualized in the graph below. +### Pizzly -![Known/unknown fusions](images/summary_graph_3.png) +The first step of the pizzly workflow is to run `kallisto quant`: -## MultiQC +#### Kallisto -[MultiQC](http://multiqc.info) is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in within the report data directory. +
+Output files -The pipeline has special steps which allow the software versions used to be reported in the MultiQC output for future traceability. +- `kallisto` + - `.kallisto_quant.fusions.txt` -**Output directory: `results/multiqc`** +
-- `Project_multiqc_report.html` - - MultiQC report - a standalone HTML file that can be viewed in your web browser -- `Project_multiqc_data/` - - Directory containing parsed statistics from the different tools used in the pipeline +Pizzly refines kallisto output. -For more information about how to use MultiQC reports, see [http://multiqc.info](http://multiqc.info) - -## Pizzly - -**Output directory: `results/tools/Pizzly`** - -- `pizzly_fusions.json` - - contains all the predicted gene fusions - -|  Column | Description | -| ------- | ----------- | -| geneA | `id`: reference id and `name`: gene name | -| geneB | Describes reference id and gene name | -| paircount | Number of paired count | -| splitcount | Number of split count | -| transcripts | List of all transcripts `fasta_record`, `transcriptA`, `transcriptB`, `support`, `reads` | -| readpairs | List of read pairs containing (`type`, `read1`, `read2`) | +#### Pizzly -For more info check the [documentation](https://github.com/pmelsted/pizzly#output). +Pizzly uses the following arguments: -## Squid +```bash +-k 31 \ +--align-score 2 \ +--insert-size 400 \ +--cache index.cache.txt +``` -**Output directory: `results/tools/Squid`** +
+Output files -- `fusions_annotated.txt` - - contains all the predicted gene fusions +- `pizzly` + - `.pizzly.txt` - contains the identified fusions + - `.pizzly.unfiltered.json` -|  Column | Description | -| ------- | ----------- | -| chr1 | chromosome name of the first breakpoint. -| start1 | starting position of the segment of the first breakpoint, or the predicted breakpoint position if strand1 is "-" | -| end1 | ending position of the segment of the first breakpoint, or the predicted breakpoint position if strand1 is "+" | -| chr2 | chromosome name of the second breakpoint | -| start2 | starting position of the segment of the second breakpoint, or the predicted breakpoint position if strand2 is "-" | -| end2 | ending position of the segment of the second breakpoint, or the predicted breakpoint position if strand2 is "+" | -| name | TSV is not named yet, this column shows with dot. -| score | number of reads supporting this TSV (without weighted by Discordant edge ratio multiplier) | -| strand1 | strand of the first segment in TSV. -| strand2 | strand of the second segment in TSV. -| num_concordantfrag_bp1 | number of concordant paired-end reads covering the first breakpoint. For a concordant paired-end read, it includes two ends and a inserted region in between, if any of the 3 regions covers the breakpoint, the read is counted in this number | -| num_concordantfrag_bp2 | number of concordant paired-end reads covering the second breakpoint. The count is defined in the same way as num_concordantfrag_bp1 | +
-For more info check the [documentation](https://github.com/Kingsford-Group/squid#output-specification). +### Squid -## Star-Fusion +Squid is run in two steps: i) fusion detection and ii) fusion annotation but the output is in a common `squid` directory. -**Output directory: `results/tools/StarFusion`** +
+Output files -- `star-fusion.fusion_predictions.tsv` - - contains all the predicted gene fusions +- `squid` + - `.squid.fusions_sv.txt` - contains the identified fusions + - `.squid.fusions.annotated.txt`- contains the identified fusions annotatedvi -|  Column | Description | -| ------- | ----------- | -| JunctionReadCount | Indicates the number of RNA-Seq fragments containing a read that aligns as a split read at the site of the putative fusion junction. | -| SpanningFragCount | Indicates the number of RNA-Seq fragments that encompass the fusion junction such that one read of the pair aligns to a different gene than the other paired-end read of that fragment. | -| SpliceType | Indicates whether the proposed breakpoint occurs at reference exon junctions as provided by the reference transcript structure annotations (ex. gencode). -| LeftGene -| LeftBreakpoint -| RightGene -| RightBreakpoint -| LargeAnchorSupport | column indicates whether there are split reads that provide 'long' (set to length of 25 bases) alignments on both sides of the putative breakpoint. | -| FFPM | fusion fragments per million total reads; **Default:** *0.1 (meaning at least 1 fusion-supporting rna-seq fragment per 10M total reads)*; **TL;DR:** can be adjusted by changing `--min_FFPM` -| LeftBreakDinuc | | -| LeftBreakEntropy | Represents Shannon entropy | -| RightBreakDinuc | -| RightBreakEntropy | Represents Shannon entropy | -| annots | Annotation generated by [FusionAnnotar](https://github.com/FusionAnnotator/FusionAnnotator/wiki) | +
-For more info check the [documentation](https://github.com/STAR-Fusion/STAR-Fusion/wiki#Outputs). +### STAR-fusion + +
+Output files + +- `starfusion` + - `.starfusion.fusion_predictions.tsv` - contains the identified fusions + - `.starfusion.abridged.tsv` + - `- contains the identified fusions.starfusion.abridged.coding_effect.tsv` + +
+ +### FusionCatcher + +
+Output files + +- `fusioncatcher` +_ `.fusioncatcher.fusion-genes.txt` +_ `.fusioncatcher.summary.txt` \* `.fusioncatcher.log` +
+ +### Samtools + +#### Samtools view + +Samtools view is used to convert the chimeric SAM output from STAR_FOR_SQUID to BAM + +
+Output files + +- `samtools_view_for_squid` + - `_chimeric.bam` - sorted BAM file + +
+ +#### Samtools sort + +Samtools sort is used to sort BAM files from STAR_FOR_ARRIBA (for arriba visualisation) and the chimeric BAM from STAR_FOR_SQUID + +
+Output files + +- `samtools_sort_for_` + - `(_chimeric)_sorted.bam` - sorted BAM file + +
+ +#### Samtools index + +Samtools index is used to index BAM files from STAR_FOR_ARRIBA (for arriba visualisation) and STAR_FOR_STARFUSION (for QC) + +
+Output files + +- `samtools_for_` + - `.(Aligned.sortedByCoord).out.bam.bai` - + +
+ +### Fusion-report + +
+Output files + +- `fusionreport` + - + - `.fusionreport.tsv` + - `.fusionreport_filtered.tsv` + - `index.html` - general report for all filtered fusions + - `.html` - specific report for each filtered fusion + +
+ +### FusionInspector + +
+Output files + +- `fusioninspector` + - `.fusion_inspector_web.html` - visualisation report described in details [here](https://github.com/FusionInspector/FusionInspector/wiki/FusionInspector-Visualizations) + - `FusionInspector.log` + - `.FusionInspector.fusions.abridged.tsv` + +
+ +### Qualimap + +
+Output files + +- `qualimap` + - `qualimapReport.html` - HTML report + - `rnaseq_qc_results.txt` - TXT results + - `css` - dir for html style + - `images_qualimapReport`- dir for html images + - `raw_data_qualimapReport` - dir for html raw data + +
+ +### Picard + +Picard CollectRnaMetrics and picard MarkDuplicates share the same outpur directory. + +
+Output files + +- `picard` + - `.MarkDuplicates.metrics.txt` - metrics from CollectRnaMetrics + - `_rna_metrics.txt` - metrics from MarkDuplicates + - `.bam` - BAM file with marked duplicates + +
+ +### FastQC + +
+Output files + +- `fastqc/` + - `*_fastqc.html`: FastQC report containing quality metrics. + - `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images. + +
+ +[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/). + +![MultiQC - FastQC sequence counts plot](images/mqc_fastqc_counts.png) + +![MultiQC - FastQC mean quality scores plot](images/mqc_fastqc_quality.png) + +![MultiQC - FastQC adapter content plot](images/mqc_fastqc_adapter.png) + +> **NB:** The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality. + +### MultiQC + +
+Output files + +- `multiqc/` + - `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser. + - `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline. + - `multiqc_plots/`: directory containing static images from the report in various formats. + +
+ +[MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory. + +Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see . + +### Pipeline information + +
+Output files + +- `pipeline_info/` + - Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. + - Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.yml`. The `pipeline_report*` files will only be present if the `--email` / `--email_on_fail` parameter's are used when running the pipeline. + - Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`. + +
+ +[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. diff --git a/docs/references.md b/docs/references.md deleted file mode 100644 index d59d2dc5..00000000 --- a/docs/references.md +++ /dev/null @@ -1,41 +0,0 @@ -# nfcore/rnafusion: Download references for tools - -Downloading references manually is a tedious long process. To make the pipeline easier to work with, we provide a script to download all necessary references for fusion detection tools. - -> **TL;DR:** Make sure to download the correct references for your need! - -```bash -nextflow run nf-core/rnafusion/download-references.nf --help -``` - -## Download all references - -```bash -# Replace and with yout credentials from COSMIC -nextflow run nf-core/rnafusion/download-references.nf \ - --download_all \ - --outdir \ - --cosmic_usr --cosmic_passwd -``` - -## Download specific references - -```bash -# Example of downloading specific tools -nextflow run nf-core/rnafusion/download-references.nf \ ---arriba \ ---outdir -``` - -## Tool reference requirements - -| Tool | FASTA | GTF | STAR-index | Other | -| ---------------- | :----------------: | :----------------: | :----------------: | :----------------: | -| Arriba | :white_check_mark: | :white_check_mark: | :white_check_mark: | `custom_reference` | -| EricScript | :x: | :x: | :x: | `custom_reference` | -| FusionCatcher | :x: | :x: | :x: | `custom_reference` | -| Fusion-Inspector | :white_check_mark: | :white_check_mark: | :white_check_mark: | `ctat_genome_lib` | -| fusion-report | :x: | :x: | :x: | `databases` | -| Pizzly | :x: | :white_check_mark: | :white_check_mark: | `cDNA` | -| Squid | :x: | :white_check_mark: | :white_check_mark: | - | -| Star-Fusion | :white_check_mark: | :white_check_mark: | :white_check_mark: | `ctat_genome_lib` | diff --git a/docs/usage.md b/docs/usage.md index 99ae7c4e..71f5cd94 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -1,498 +1,363 @@ - -# nf-core/rnafusion: Usage - - -## Table of contents - -- [Introduction](#introduction) -- [Running the pipeline](#running-the-pipeline) - - [Running the pipeline using Docker](#running-the-pipeline-using-docker) - - [Running the pipeline using Singularity](#running-the-pipeline-using-singularity) - - [Running specific tools](#running-specific-tools) - - [Updating the pipeline](#updating-the-pipeline) - - [Reproducibility](#reproducibility) -- [Main arguments](#main-arguments) - - [`-profile`](#-profile) - - [`--reads`](#--reads) - - [`--single_end`](#--single_end) -- [Tool flags](#tool-flags) - - [`--arriba`](#--arriba) - - [`--ericscript`](#--ericscript) - - [`--fusioncatcher`](#--fusioncatcher) - - [`--fusion_report`](#--fusion_report) - - [`--pizzly`](#--pizzly) - - [`--squid`](#--squid) - - [`--star_fusion`](#--star_fusion) -- [Visualization flags](#visualization-flags) - - [`--arriba_vis`](#--arriba_vis) - - [`--fusion_inspector`](#--fusion_inspector) -- [Reference genomes](#reference-genomes) - - [`--arriba_ref`](#--arriba_ref) - - [`--databases`](#--databases) - - [`--ericscript_ref`](#--ericscript_ref) - - [`--fasta`](#--fasta) - - [`--fusioncatcher_ref`](#--fusioncatcher_ref) - - [`--genome`](#--genome) - - [`--gtf`](#--gtf) - - [`--reference_release`](#--reference_release) - - [`--star_index`](#--star_index) - - [`--star_fusion_ref`](#--star_fusion_ref) - - [`--transcript`](#--transcript) -- [Job resources](#job-resources) - - [Automatic resubmission](#automatic-resubmission) - - [Custom resource requests](#custom-resource-requests) -- [AWS Batch specific parameters](#aws-batch-specific-parameters) - - [`--awsqueue`](#--awsqueue) - - [`--awsregion`](#--awsregion) - - [`--awscli`](#--awscli) -- [Other command line parameters](#other-command-line-parameters) - - [`--debug`](#--debug) - - [`--read_length`](#--read_length) - - [`--outdir`](#--outdir) - - [`--email`](#--email) - - [`--email_on_fail`](#--email_on_fail) - - [`--max_multiqc_email_size`](#--max_multiqc_email_size) - - [`-name`](#-name) - - [`-resume`](#-resume) - - [`-c`](#-c) - - [`--custom_config_version`](#--custom_config_version) - - [`--custom_config_base`](#--custom_config_base) - - [`--max_memory`](#--max_memory) - - [`--max_time`](#--max_time) - - [`--max_cpus`](#--max_cpus) - - [`--plaintext_email`](#--plaintext_email) - - [`--monochrome_logs`](#--monochrome_logs) - - [`--multiqc_config`](#--multiqc_config) +# nf-core/rnafusion: Usage -## Introduction +## :warning: Please read this documentation on the nf-core website: [https://nf-co.re/rnafusion/usage](https://nf-co.re/rnafusion/usage) -Nextflow handles job submissions on SLURM or other environments, and supervises running the jobs. Thus the Nextflow process must run until the pipeline is finished. We recommend that you put the process running in the background through `screen` / `tmux` or similar tool. Alternatively you can run nextflow within a cluster job submitted your job scheduler. +> _Documentation of pipeline parameters is generated automatically from the pipeline schema and can no longer be found in markdown files._ -It is recommended to limit the Nextflow Java virtual machines memory. We recommend adding the following line to your environment (typically in `~/.bashrc` or `~./bash_profile`): +## Introduction -```bash -NXF_OPTS='-Xms1g -Xmx4g' -``` +The pipeline is divided into two parts: -## Running the pipeline +1. Downloading and building the references, using `--build_references`: done only once and after each update. +2. Detecting fusions using (any combination of) the following tools: + - arriba + - fusioncatcher + - pizzly + - squid + - starfusion +3. QC and visualisation tools + - Fastqc + - MultiQC + - arriba visualisation (for fusion detected by arriba only) + - fusion-report + - fusionInspector -The typical command for running the pipeline is as follows. +### Prerequisite: download and build references -### Running the pipeline using Docker +The rnafusion pipeline needs references for the fusion detection tools, so downloading these is a **requirement**. +It is possible to download and build each reference manually (for example in case of non-human samples that are not supported currently) feed references manually to rnafusion using the arguments `--_ref` but it is advised to download references with rnafusion: ```bash nextflow run nf-core/rnafusion \ - -profile docker \ - --reads '*_R{1,2}.fastq.gz' \ - --arriba \ - --star_fusion \ - --fusioncatcher \ - --ericscript \ - --pizzly \ - --squid \ - --arriba_vis \ - --fusion_inspector +--build_references --all \ +--genomes_base \ ``` -### Running the pipeline using Singularity +References for the different tools can also be downloaded separately: ```bash -nextflow run nf-core/rnafusion/download-singularity-img.nf --download_all --outdir /path +nextflow run nf-core/rnafusion \ +--build_references -- \ +--genomes_base ``` -If the nextflow download script crashes (network issue), please use the bash script instead. +This PATH will be the place the references will be saved. -```bash -cd utils && sh download-singularity-img.sh /path/to/images -``` - -The command bellow will launch the pipeline using `singularity`. +Optional: by default STAR-Fusion references are built. You can also download them from CTAT. This allows more flexibility for different organisms but be aware that **this is not fully tested -> not recommended**: ```bash nextflow run nf-core/rnafusion \ - -profile singularity \ - --reads '*_R{1,2}.fastq.gz' \ - --arriba \ - --star_fusion \ - --fusioncatcher \ - --ericscript \ - --pizzly \ - --squid \ - --arriba_vis \ - --fusion_inspector +--build_references --starfusion/--all \ +--starfusion_build false \ +--genomes_base ``` -### Running specific tools +Then use the flag `--starfusion_build` while running the detection. + +### Running all detection tools ```bash nextflow run nf-core/rnafusion \ - -profile singularity -c 'example/custom-singularity.config' \ - --reads '*_R{1,2}.fastq.gz' \ - --arriba \ - --squid +--input '[path to samplesheet file]' --all \ +--outdir \ ``` -Note that the pipeline will create the following files in your working directory: +Visualisation tools will be run on all fusions detected. -```bash -work # Directory containing the nextflow working files -results # Finished results (configurable, see below) -.nextflow_log # Log file from Nextflow -# Other nextflow hidden files, eg. history of pipeline runs and old logs. -``` - -### Updating the pipeline - -When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: +### Running a specific detection tool ```bash -nextflow pull nf-core/rnafusion +nextflow run nf-core/rnafusion \ +--input '[path to samplesheet file]' -- \ +--outdir \ ``` -### Reproducibility - -It's a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. - -First, go to the [nf-core/rnafusion releases page](https://github.com/nf-core/rnafusion/releases) and find the latest version number - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. - -This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. - -## Main arguments +Visualisation tools will be run on all fusions detected. -### `-profile` +#### Optional manual feed-in of fusion files -Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. +It is possible to give the output of each tool manually using the argument: `--_fusions PATH/TO/FUSION/FILE`: this feature need more testing, don't hesitate to open an issue if you encounter problems. -Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Conda) - see below. - -> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. +## Samplesheet input -The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). - -Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important! -They are loaded in sequence, so later profiles can overwrite earlier profiles. +You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below. -If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended. - -- `docker` - - A generic configuration profile to be used with [Docker](http://docker.com/) - - Pulls software from DockerHub: [`nfcore/rnafusion`](http://hub.docker.com/r/nfcore/rnafusion/) -- `singularity` - - A generic configuration profile to be used with [Singularity](http://singularity.lbl.gov/) - - Pulls software from DockerHub: [`nfcore/rnafusion`](http://hub.docker.com/r/nfcore/rnafusion/) -- `test` - - A profile with a complete configuration for automated testing - - Includes links to test data so needs no other parameters - -### `--reads` - -Use this to specify the location of your input FastQ files. For example: - -```bash ---reads 'path/to/data/sample_*_{1,2}.fastq.gz' +```console +--input '[path to samplesheet file]' ``` -Please note the following requirements: - -1. The path must be enclosed in quotes -2. The path must have at least one `*` wildcard character -3. When using the pipeline with paired end data, the path must use `{1,2}` notation to specify read pairs. - -If left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz` - -### `--single_end` +### Multiple runs of the same sample -By default, the pipeline expects paired-end data. If you have single-end data, you need to specify `--single_end` on the command line when you launch the pipeline. A normal glob pattern, enclosed in quotation marks, can then be used for `--reads`. For example: +The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes: -```bash ---single_end --reads '*.fastq' +```console +sample,fastq_1,fastq_2 +CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz +CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz +CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz ``` -## Tool flags - -### `--arriba` - -If enabled, executes `Arriba` tool. - -- `--arriba_opt` - - Specify additional parameters. For more info, please refer to the [documentation](http://arriba.readthedocs.io/en/latest/quickstart/) of the tool. - -### `--ericscript` - -If enabled, executes `Ericscript` tool. - -- `--ericscript_opt` - - Specify additional parameters. For more info, please refer to the [documentation](https://sites.google.com/site/bioericscript/home) of the tool. - -### `--fusioncatcher` - -If enabled, executes `Fusioncatcher` tool. - -- `--fusioncatcher_opt` - - Specify additional parameters. For more info, please refer to the [documentation](https://github.com/ndaniel/fusioncatcher/blob/master/doc/manual.md) of the tool. - -### `--fusion_report` - -If enabled, download databases for `fusion-report`. - -- `fusion_report_opt` - - Specify additional parameters. For more info, please refer to the [documentation](https://matq007.github.io/fusion-report/#/) of the tool. - -### `--pizzly` - -If enabled, executes `Pizzly` tool. - -- `--pizzly_k` - - Number of k-mers. Deafult 31. - -### `--squid` - -If enabled, executes `Squid` tool. - -### `--star_fusion` - -If enabled, executes `STAR-Fusion` tool. - -- `--star_fusion_opt` - - Parameter for specifying additional parameters. For more info, please refer to the [documentation](https://github.com/STAR-Fusion/STAR-Fusion/wiki) of the tool. - -## Visualization flags +### Full samplesheet -### `--arriba_vis` +The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 4 columns to match those defined in the table below. -If enabled, executes build in `Arriba` visualization tool. +A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 6 samples, where `TREATMENT_REP3` has been sequenced twice. -### `--fusion_inspector` - -If enabled, executes `Fusion-Inspector` tool. - -## Reference genomes - -### `--arriba_ref` - -```bash ---arriba_ref '[path to Arriba reference]' -``` - -### `--databases` - -Required databases in order to run `fusion-report`. - -```bash ---databases '[path to fusion-report databases]' -``` - -### `--ericscript_ref` - -Required reference in order to run `EricScript`. - -```bash ---ericscript_ref '[path to EricScript reference]' -``` - -### `--fasta` - -If you prefer, you can specify the full path to your reference genome when you run the pipeline: - -```bash ---fasta '[path to Fasta reference]' +```console +sample,fastq_1,fastq_2,strandedness +CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,forward +CONTROL_REP2,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz,forward +CONTROL_REP3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz,forward +TREATMENT_REP1,AEG588A4_S4_L003_R1_001.fastq.gz,,forward +TREATMENT_REP2,AEG588A5_S5_L003_R1_001.fastq.gz,,forward +TREATMENT_REP3,AEG588A6_S6_L003_R1_001.fastq.gz,,forward +TREATMENT_REP3,AEG588A6_S6_L004_R1_001.fastq.gz,,forward ``` -### `--fusioncatcher_ref` - -Required reference in order to run `Fusioncatcher`. +| Column | Description | +| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). | +| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | +| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | +| `strandedness` | Strandedness: forward or reverse. | -```bash ---fusioncatcher_ref '[path to Fusioncatcher reference]' -``` - -### `--genome` - -This pipeline uses only `Homo Sapiens` version `GRCh38`. Also make sure to specify `--genomes_base`. - -```bash ---genome 'GRCh38' --genome_base '/path/to/references' -``` +An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline. -### `--gtf` +## Running the pipeline -Required annotation file. +The typical command for running the pipeline is as follows. -```bash ---gtf '[path to GTF annotation]' +```console +nextflow run nf-core/rnafusion --input samplesheet.csv --outdir --genome GRCh38 -profile docker ``` -### `--reference_release` +This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles. -Ensembl version. +Note that the pipeline will create the following files in your working directory: -```bash -# ftp://ftp.ensembl.org/pub/release-97/fasta/homo_sapiens/ ---reference_release '97' +```console +work # Directory containing the nextflow working files + # Finished results in specified location (defined with --outdir) +.nextflow_log # Log file from Nextflow +# Other nextflow hidden files, eg. history of pipeline runs and old logs. ``` -### `--star_index` +### Updating the pipeline -If you prefer, you can specify the full path for `STAR` index when you run the pipeline. If not specified, the pipeline will build the index using for reads with length 100bp (can be adjusted with parameter `--read_length`). +When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: -```bash ---star_index '[path to STAR index]' +```console +nextflow pull nf-core/rnafusion ``` -### `--star_fusion_ref` +### Reproducibility -Required reference in order to run `STAR-Fusion`. +It is a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. -```bash ---star_fusion_ref '[path to STAR-Fusion reference]' -``` +First, go to the [nf-core/rnafusion releases page](https://github.com/nf-core/rnafusion/releases) and find the latest version number - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. -### `--transcript` +This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. -Required transcript file. +## Core Nextflow arguments -```bash ---transcript '[path to transcript reference]' -``` +> **NB:** These options are part of Nextflow and use a _single_ hyphen (pipeline parameters use a double-hyphen). -## Job resources +### `-profile` -### Automatic resubmission +Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. -Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with an error code of `143` (exceeded requested resources) it will automatically resubmit with higher requests (2 x original, then 3 x original). If it still fails after three times then the pipeline is stopped. +Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Conda) - see below. When using Biocontainers, most of these software packaging methods pull Docker containers from quay.io e.g [FastQC](https://quay.io/repository/biocontainers/fastqc) except for Singularity which directly downloads Singularity images via https hosted by the [Galaxy project](https://depot.galaxyproject.org/singularity/) and Conda which downloads and installs software locally from [Bioconda](https://bioconda.github.io/). -### Custom resource requests +> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. -Wherever process-specific requirements are set in the pipeline, the default value can be changed by creating a custom config file. See the files hosted at [`nf-core/configs`](https://github.com/nf-core/configs/tree/master/conf) for examples. +The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). -If you are likely to be running `nf-core` pipelines regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter (see definition below). You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. +Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important! +They are loaded in sequence, so later profiles can overwrite earlier profiles. -If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack). +If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended. -## AWS Batch specific parameters +- `docker` + - A generic configuration profile to be used with [Docker](https://docker.com/) +- `singularity` + - A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/) +- `podman` + - A generic configuration profile to be used with [Podman](https://podman.io/) +- `shifter` + - A generic configuration profile to be used with [Shifter](https://nersc.gitlab.io/development/shifter/how-to-use/) +- `charliecloud` + - A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/) +- `conda` + - A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter or Charliecloud. +- `test` + - A profile with a complete configuration for automated testing + - Includes links to test data so needs no other parameters -Running the pipeline on AWS Batch requires a couple of specific parameters to be set according to your AWS Batch configuration. Please use [`-profile awsbatch`](https://github.com/nf-core/configs/blob/master/conf/awsbatch.config) and then specify all of the following parameters. +### `-resume` -### `--awsqueue` +Specify this when restarting a pipeline. Nextflow will use cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously. For input to be considered the same, not only the names must be identical but the files' contents as well. For more info about this parameter, see [this blog post](https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html). -The JobQueue that you intend to use on AWS Batch. +You can also supply a run name to resume a specific run: `-resume [run-name]`. Use the `nextflow log` command to show previous run names. -### `--awsregion` +### `-c` -The AWS region in which to run your job. Default is set to `eu-west-1` but can be adjusted to your needs. +Specify the path to a specific config file (this is a core Nextflow command). See the [nf-core website documentation](https://nf-co.re/usage/configuration) for more information. -### `--awscli` +## Custom configuration -The [AWS CLI](https://www.nextflow.io/docs/latest/awscloud.html#aws-cli-installation) path in your custom AMI. Default: `/home/ec2-user/miniconda/bin/aws`. +### Resource requests -The AWS region to run your job in. Default is set to `eu-west-1` but can be adjusted to your needs. -Please make sure to also set the `-w/--work-dir` and `--outdir` parameters to a S3 storage bucket of your choice - you'll get an error message notifying you if you didn't. +Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped. -## Other command line parameters +For example, if the nf-core/rnaseq pipeline is failing after multiple re-submissions of the `STAR_ALIGN` process due to an exit code of `137` this would indicate that there is an out of memory issue: -### `--debug` +```console +[62/149eb0] NOTE: Process `NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)` terminated with an error exit status (137) -- Execution is retried (1) +Error executing process > 'NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)' -To run only a specific tool (testing freshly implemented tool) just add `--debug` parameter. This parameter only works on **fusion tools only**! +Caused by: + Process `NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)` terminated with an error exit status (137) -### `--read_length` +Command executed: + STAR \ + --genomeDir star \ + --readFilesIn WT_REP1_trimmed.fq.gz \ + --runThreadN 2 \ + --outFileNamePrefix WT_REP1. \ + -Length is used to build a STAR index. Default is 100bp (Illumina). +Command exit status: + 137 -### `--outdir` +Command output: + (empty) -The output directory where the results will be saved. +Command error: + .command.sh: line 9: 30 Killed STAR --genomeDir star --readFilesIn WT_REP1_trimmed.fq.gz --runThreadN 2 --outFileNamePrefix WT_REP1. +Work dir: + /home/pipelinetest/work/9d/172ca5881234073e8d76f2a19c88fb -### `--email` +Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run` +``` -Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run. +To bypass this error you would need to find exactly which resources are set by the `STAR_ALIGN` process. The quickest way is to search for `process STAR_ALIGN` in the [nf-core/rnaseq Github repo](https://github.com/nf-core/rnaseq/search?q=process+STAR_ALIGN). +We have standardised the structure of Nextflow DSL2 pipelines such that all module files will be present in the `modules/` directory and so, based on the search results, the file we want is `modules/nf-core/software/star/align/main.nf`. +If you click on the link to that file you will notice that there is a `label` directive at the top of the module that is set to [`label process_high`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L9). +The [Nextflow `label`](https://www.nextflow.io/docs/latest/process.html#label) directive allows us to organise workflow processes in separate groups which can be referenced in a configuration file to select and configure subset of processes having similar computing requirements. +The default values for the `process_high` label are set in the pipeline's [`base.config`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L33-L37) which in this case is defined as 72GB. +Providing you haven't set any other standard nf-core parameters to **cap** the [maximum resources](https://nf-co.re/usage/configuration#max-resources) used by the pipeline then we can try and bypass the `STAR_ALIGN` process failure by creating a custom config file that sets at least 72GB of memory, in this case increased to 100GB. +The custom config below can then be provided to the pipeline via the [`-c`](#-c) parameter as highlighted in previous sections. + +```nextflow +process { + withName: 'NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN' { + memory = 100.GB + } +} +``` -### `--email_on_fail` +> **NB:** We specify the full process name i.e. `NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN` in the config file because this takes priority over the short name (`STAR_ALIGN`) and allows existing configuration using the full process name to be correctly overridden. -This works exactly as with `--email`, except emails are only sent if the workflow is not successful. +If you get a warning suggesting that the process selector isn't recognised check that the process name has been specified correctly. -### `--max_multiqc_email_size` +### Tool-specific options -Threshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB). +For the ultimate flexibility, we have implemented and are using Nextflow DSL2 modules in a way where it is possible for both developers and users to change tool-specific command-line arguments (e.g. providing an additional command-line argument to the `STAR_ALIGN` process) as well as publishing options (e.g. saving files produced by the `STAR_ALIGN` process that aren't saved by default by the pipeline). In the majority of instances, as a user you won't have to change the default options set by the pipeline developer(s), however, there may be edge cases where creating a simple custom config file can improve the behaviour of the pipeline if for example it is failing due to a weird error that requires setting a tool-specific parameter to deal with smaller / larger genomes. -### `-name` +The command-line arguments passed to STAR in the `STAR_ALIGN` module are a combination of: -Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic. +- Mandatory arguments or those that need to be evaluated within the scope of the module, as supplied in the [`script`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L49-L55) section of the module file. -Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic. -This is used in the MultiQC report (if not default) and in the summary HTML / e-mail (always). +- An [`options.args`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L56) string of non-mandatory parameters that is set to be empty by default in the module but can be overwritten when including the module in the sub-workflow / workflow context via the `addParams` Nextflow option. -**NB:** Single hyphen (core Nextflow option) +The nf-core/rnaseq pipeline has a sub-workflow (see [terminology](https://github.com/nf-core/modules#terminology)) specifically to align reads with STAR and to sort, index and generate some basic stats on the resulting BAM files using SAMtools. At the top of this file we import the `STAR_ALIGN` module via the Nextflow [`include`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/subworkflows/nf-core/align_star.nf#L10) keyword and by default the options passed to the module via the `addParams` option are set as an empty Groovy map [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/subworkflows/nf-core/align_star.nf#L5); this in turn means `options.args` will be set to empty by default in the module file too. This is an intentional design choice and allows us to implement well-written sub-workflows composed of a chain of tools that by default run with the bare minimum parameter set for any given tool in order to make it much easier to share across pipelines and to provide the flexibility for users and developers to customise any non-mandatory arguments. -### `-resume` +When including the sub-workflow above in the main pipeline workflow we use the same `include` statement, however, we now have the ability to overwrite options for each of the tools in the sub-workflow including the [`align_options`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/workflows/rnaseq.nf#L225) variable that will be used specifically to overwrite the optional arguments passed to the `STAR_ALIGN` module. In this case, the options to be provided to `STAR_ALIGN` have been assigned sensible defaults by the developer(s) in the pipeline's [`modules.config`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/modules.config#L70-L74) and can be accessed and customised in the [workflow context](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/workflows/rnaseq.nf#L201-L204) too before eventually passing them to the sub-workflow as a Groovy map called `star_align_options`. These options will then be propagated from `workflow -> sub-workflow -> module`. -Specify this when restarting a pipeline. Nextflow will used cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously. +As mentioned at the beginning of this section it may also be necessary for users to overwrite the options passed to modules to be able to customise specific aspects of the way in which a particular tool is executed by the pipeline. Given that all of the default module options are stored in the pipeline's `modules.config` as a [`params` variable](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/modules.config#L24-L25) it is also possible to overwrite any of these options via a custom config file. -Specify this when restarting a pipeline. Nextflow will used cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously. -You can also supply a run name to resume a specific run: `-resume [run-name]`. Use the `nextflow log` command to show previous run names. +Say for example we want to append an additional, non-mandatory parameter (i.e. `--outFilterMismatchNmax 16`) to the arguments passed to the `STAR_ALIGN` module. Firstly, we need to copy across the default `args` specified in the [`modules.config`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/modules.config#L71) and create a custom config file that is a composite of the default `args` as well as the additional options you would like to provide. This is very important because Nextflow will overwrite the default value of `args` that you provide via the custom config. -**NB:** Single hyphen (core Nextflow option) +As you will see in the example below, we have: -### `-c` +- appended `--outFilterMismatchNmax 16` to the default `args` used by the module. +- changed the default `publishDir` value to where the files will eventually be published in the main results directory. +- appended `'bam':''` to the default value of `publish_files` so that the BAM files generated by the process will also be saved in the top-level results directory for the module. Note: `'out':'log'` means any file/directory ending in `out` will now be saved in a separate directory called `my_star_directory/log/`. -Specify the path to a specific config file (this is a core NextFlow command). +```nextflow +params { + modules { + 'star_align' { + args = "--quantMode omeSAM --twopassMode Basic --outSAMtype BAM Unsorted --readFilesCommand zcat --runRNGseed 0 --outFilterMultimapNmax 20 --alignSJDBoverhangMin 1 --outSAMattributes NH HI AS NM MD --quantomeBan Singleend --outFilterMismatchNmax 16" + publishDir = "my_star_directory" + publish_files = ['out':'log', 'tab':'log', 'bam':''] + } + } +} +``` -**NB:** Single hyphen (core Nextflow option) +### Updating containers -Note - you can use this to override pipeline defaults. +The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. If for some reason you need to use a different version of a particular tool with the pipeline then you just need to identify the `process` name and override the Nextflow `container` definition for that process using the `withName` declaration. For example, in the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline a tool called [Pangolin](https://github.com/cov-lineages/pangolin) has been used during the COVID-19 pandemic to assign lineages to SARS-CoV-2 genome sequenced samples. Given that the lineage assignments change quite frequently it doesn't make sense to re-release the nf-core/viralrecon everytime a new version of Pangolin has been released. However, you can override the default container used by the pipeline by creating a custom config file and passing it as a command-line argument via `-c custom.config`. -### `--custom_config_version` +1. Check the default version used by the pipeline in the module file for [Pangolin](https://github.com/nf-core/viralrecon/blob/a85d5969f9025409e3618d6c280ef15ce417df65/modules/nf-core/software/pangolin/main.nf#L14-L19) +2. Find the latest version of the Biocontainer available on [Quay.io](https://quay.io/repository/biocontainers/pangolin?tag=latest&tab=tags) +3. Create the custom config accordingly: -Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default: `master`. + - For Docker: -```bash -## Download and use config file with following git commid id ---custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96 -``` + ```nextflow + process { + withName: PANGOLIN { + container = 'quay.io/biocontainers/pangolin:3.0.5--pyhdfd78af_0' + } + } + ``` -### `--custom_config_base` + - For Singularity: -If you're running offline, nextflow will not be able to fetch the institutional config files -from the internet. If you don't need them, then this is not a problem. If you do need them, -you should download the files from the repo and tell nextflow where to find them with the -`custom_config_base` option. For example: + ```nextflow + process { + withName: PANGOLIN { + container = 'https://depot.galaxyproject.org/singularity/pangolin:3.0.5--pyhdfd78af_0' + } + } + ``` -```bash -## Download and unzip the config files -cd /path/to/my/configs -wget https://github.com/nf-core/configs/archive/master.zip -unzip master.zip - -## Run the pipeline -cd /path/to/my/data -nextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/ -``` + - For Conda: -> Note that the nf-core/tools helper package has a `download` command to download all required pipeline -> files + singularity containers + institutional configs in one go for you, to make this process easier. + ```nextflow + process { + withName: PANGOLIN { + conda = 'bioconda::pangolin=3.0.5' + } + } + ``` -### `--max_memory` +> **NB:** If you wish to periodically update individual tool-specific results (e.g. Pangolin) generated by the pipeline then you must ensure to keep the `work/` directory otherwise the `-resume` ability of the pipeline will be compromised and it will restart from scratch. -Use to set a top-limit for the default memory requirement for each process. -Should be a string in the format integer-unit. eg. `--max_memory '8.GB'` +### nf-core/configs -### `--max_time` +In most cases, you will only need to create a custom config as a one-off but if you and others within your organisation are likely to be running nf-core pipelines regularly and need to use the same settings regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter. You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. -Use to set a top-limit for the default time requirement for each process. -Should be a string in the format integer-unit. eg. `--max_time '2.h'` +See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more information about creating your own configuration files. -### `--max_cpus` +If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs). --> -Use to set a top-limit for the default CPU requirement for each process. -Should be a string in the format integer-unit. eg. `--max_cpus 1` +## Running in the background -### `--plaintext_email` +Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished. -Set to receive plain-text e-mails instead of HTML formatted. +The Nextflow `-bg` flag launches Nextflow in the background, detached from your terminal so that the workflow does not stop if you log out of your session. The logs are saved to a file. -### `--monochrome_logs` +Alternatively, you can use `screen` / `tmux` or similar tool to create a detached session which you can log back into at a later time. +Some HPC setups also allow you to run nextflow within a cluster job submitted your job scheduler (from where it submits more jobs). -Set to disable colourful command line output and live life in monochrome. +## Nextflow memory requirements -### `--multiqc_config` +In some cases, the Nextflow Java virtual machines can start to request a large amount of memory. +We recommend adding the following line to your environment to limit this (typically in `~/.bashrc` or `~./bash_profile`): -Specify a path to a custom MultiQC configuration file. +```console +NXF_OPTS='-Xms1g -Xmx4g' +``` diff --git a/download-references.nf b/download-references.nf deleted file mode 100644 index c9545338..00000000 --- a/download-references.nf +++ /dev/null @@ -1,246 +0,0 @@ -#!/usr/bin/env nextflow -/* -================================================================================ - nf-core/rnafusion -================================================================================ -nf-core/rnafusion: - RNA-seq analysis pipeline for detection gene-fusions --------------------------------------------------------------------------------- - @Homepage - https://nf-co.re/rnafusion --------------------------------------------------------------------------------- - @Documentation - https://nf-co.re/rnafusion/docs --------------------------------------------------------------------------------- - @Repository - https://github.com/nf-core/rnafusion --------------------------------------------------------------------------------- -*/ - -def helpMessage() { - log.info nfcoreHeader() - log.info""" - Usage: - - The typical command for downloading references is as follows: - - nextflow run nf-core/rnafusion/download-references.nf -profile [PROFILE] [OPTIONS] --outdir /path/to/output - - Mandatory arguments: - --outdir [path] Output directory for downloading - - Options: - --download_all [bool] Download all references - --reference_release [int] Release number of Ensembl reference for FASTA and GTF - Default: 97 -> ftp://ftp.ensembl.org/pub/release-97 - --base [bool] Download FASTA, GTF, cDNA - --arriba [bool] Download Arriba references - --star_fusion [bool] Build STAR-Fusion references from FASTA ANF GTF - --fusioncatcher [bool] Download Fusioncatcher references - --ericscript [bool] Download Ericscript references - --fusion_report [bool] Download databases for fusion-report - --cosmic_usr [str] [Required with fusion_report] COSMIC username - --cosmic_passwd [str] [Required with fusion_report] COSMIC password - """.stripIndent() -} - -/* - * SET UP CONFIGURATION VARIABLES - */ - -// Show help emssage -if (params.help) exit 0, helpMessage() -if (!params.outdir) exit 1, "Output directory not specified!" - -running_tools = [] -if (params.base || params.download_all) running_tools.add("Reference v${params.reference_release}") -if (params.arriba || params.download_all) running_tools.add("Arriba") -if (params.star_fusion || params.download_all) running_tools.add("STAR-Fusion") -if (params.fusioncatcher || params.download_all) running_tools.add("Fusioncatcher") -if (params.ericscript || params.download_all) running_tools.add("Ericscript") -if (params.fusion_report || params.download_all) { - running_tools.add('fusion-report') - if (!params.cosmic_usr || !params.cosmic_passwd) exit 1, "Database credentials are required parameter!" -} - -// Header log info -log.info nfcoreHeader() -def summary = [:] -summary['Pipeline Name'] = 'nf-core/rnafusion/download-references.nf' -summary['Pipeline Version'] = workflow.manifest.version -summary['References'] = running_tools.size() == 0 ? 'None' : running_tools.join(", ") -if(workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" -summary['Max Resources'] = "$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job" -summary['Output dir'] = params.outdir -summary['User'] = workflow.userName -log.info summary.collect { k,v -> "${k.padRight(18)}: $v" }.join("\n") -log.info "\033[2m----------------------------------------------------\033[0m" - -// Check the hostnames against configured profiles -checkHostname() - -/* -================================================================================ - DOWNLOAD -================================================================================ -*/ - -process download_base { - publishDir "${params.outdir}/", mode: 'copy' - - when: - params.base || params.download_all - - output: - file "Homo_sapiens.GRCh38_r${params.reference_release}.all.fa" into fasta - file "Homo_sapiens.GRCh38_r${params.reference_release}.gtf" into gtf - file "Homo_sapiens.GRCh38_r${params.reference_release}.cdna.all.fa.gz" into transcript - - script: - """ - wget ftp://ftp.ensembl.org/pub/release-${params.reference_release}/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.{1..22}.fa.gz - wget ftp://ftp.ensembl.org/pub/release-${params.reference_release}/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.{MT,X,Y}.fa.gz - gunzip -c Homo_sapiens.GRCh38.dna.chromosome.* > Homo_sapiens.GRCh38_r${params.reference_release}.all.fa - wget ftp://ftp.ensembl.org/pub/release-${params.reference_release}/gtf/homo_sapiens/Homo_sapiens.GRCh38.${params.reference_release}.chr.gtf.gz -O Homo_sapiens.GRCh38_r${params.reference_release}.gtf.gz - gunzip Homo_sapiens.GRCh38_r${params.reference_release}.gtf.gz - wget ftp://ftp.ensembl.org/pub/release-${params.reference_release}/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz -O Homo_sapiens.GRCh38_r${params.reference_release}.cdna.all.fa.gz - """ -} - -process download_arriba { - publishDir "${params.outdir}/arriba", mode: 'copy' - - when: - params.arriba || params.download_all - - output: - file '*' - - script: - """ - wget -N https://github.com/suhrig/arriba/releases/download/v1.2.0/arriba_v1.2.0.tar.gz -O arriba_v1.2.0.tar.gz - tar -xvzf arriba_v1.2.0.tar.gz && mv arriba_v1.2.0/database/* . && gunzip *.gz && rm -rf arriba_* - """ -} - -process download_star_fusion { - publishDir "${params.outdir}/star-fusion", mode: 'copy' - - when: - params.star_fusion || params.download_all - - output: - file '*' - - script: - """ - aws s3 --no-sign-request --region ${params.awsregion} cp s3://ngi-igenomes/Homo_sapiens/Ensembl/GRCh38/Genome/CTAT/ctat_star_fusion_1_8_1.tar.gz . - tar -xf ctat_star_fusion_1_8_1.tar.gz --strip-components=5 - rm ctat_star_fusion_1_8_1.tar.gz - """ -} - -process download_fusioncatcher { - publishDir "${params.outdir}/fusioncatcher", mode: 'copy' - - when: - params.fusioncatcher || params.download_all - - output: - file '*' - - script: - """ - wget -N http://sourceforge.net/projects/fusioncatcher/files/data/human_v98.tar.gz.aa - wget -N http://sourceforge.net/projects/fusioncatcher/files/data/human_v98.tar.gz.ab - wget -N http://sourceforge.net/projects/fusioncatcher/files/data/human_v98.tar.gz.ac - wget -N http://sourceforge.net/projects/fusioncatcher/files/data/human_v98.tar.gz.ad - cat human_v98.tar.gz.* | tar xz - rm human_v98.tar* - """ -} - -process download_ericscript { - publishDir "${params.outdir}/ericscript", mode: 'copy' - - when: - params.ericscript || params.download_all - - output: - file '*' - - script: - """ - wget -N https://raw.githubusercontent.com/circulosmeos/gdown.pl/dfd6dc910a38a42d550397bb5c2335be2c4bcf54/gdown.pl - chmod +x gdown.pl - ./gdown.pl "https://drive.google.com/uc?export=download&confirm=qgOc&id=0B9s__vuJPvIiUGt1SnFMZFg4TlE" ericscript_db_homosapiens_ensembl84.tar.bz2 - tar jxf ericscript_db_homosapiens_ensembl84.tar.bz2 - rm gdown.pl ericscript_db_homosapiens_ensembl84.tar.bz2 - """ -} - -process download_databases { - publishDir "${params.outdir}/databases", mode: 'copy' - - when: - params.fusion_report || params.download_all - - output: - file '*' - - script: - """ - fusion_report download --cosmic_usr "${params.cosmic_usr}" --cosmic_passwd "${params.cosmic_passwd}" . - """ -} - -/* - * Completion - */ -workflow.onComplete { - log.info "[nf-core/rnafusion/download-references.nf] Pipeline Complete" -} - -def nfcoreHeader() { - // Log colors ANSI codes - c_black = params.monochrome_logs ? '' : "\033[0;30m"; - c_blue = params.monochrome_logs ? '' : "\033[0;34m"; - c_cyan = params.monochrome_logs ? '' : "\033[0;36m"; - c_dim = params.monochrome_logs ? '' : "\033[2m"; - c_green = params.monochrome_logs ? '' : "\033[0;32m"; - c_purple = params.monochrome_logs ? '' : "\033[0;35m"; - c_reset = params.monochrome_logs ? '' : "\033[0m"; - c_white = params.monochrome_logs ? '' : "\033[0;37m"; - c_yellow = params.monochrome_logs ? '' : "\033[0;33m"; - - return """ -${c_dim}--------------------------------------------------${c_reset}- - ${c_green},--.${c_black}/${c_green},-.${c_reset} - ${c_blue} ___ __ __ __ ___ ${c_green}/,-._.--~\'${c_reset} - ${c_blue} |\\ | |__ __ / ` / \\ |__) |__ ${c_yellow}} {${c_reset} - ${c_blue} | \\| | \\__, \\__/ | \\ |___ ${c_green}\\`-._,-`-,${c_reset} - ${c_green}`._,._,\'${c_reset} - ${c_purple} nf-core/rnafusion v${workflow.manifest.version}${c_reset} - -${c_dim}--------------------------------------------------${c_reset}- - """.stripIndent() -} - -def checkHostname() { - def c_reset = params.monochrome_logs ? '' : "\033[0m" - def c_white = params.monochrome_logs ? '' : "\033[0;37m" - def c_red = params.monochrome_logs ? '' : "\033[1;91m" - def c_yellow_bold = params.monochrome_logs ? '' : "\033[1;93m" - if (params.hostnames) { - def hostname = "hostname".execute().text.trim() - params.hostnames.each { prof, hnames -> - hnames.each { hname -> - if (hostname.contains(hname) && !workflow.profile.contains(prof)) { - log.error "====================================================\n" + - " ${c_red}WARNING!${c_reset} You are running with `-profile $workflow.profile`\n" + - " but your machine hostname is ${c_white}'$hostname'${c_reset}\n" + - " ${c_yellow_bold}It's highly recommended that you use `-profile $prof${c_reset}`\n" + - "============================================================" - } - } - } - } -} diff --git a/environment.yml b/environment.yml deleted file mode 100644 index 645b707a..00000000 --- a/environment.yml +++ /dev/null @@ -1,24 +0,0 @@ -# You can use this file to create a conda environment for this pipeline: -# conda env create -f environment.yml -name: nf-core-rnafusion-1.2.0 -channels: - - conda-forge - - bioconda - - defaults -dependencies: - - conda-forge::python=3.7.3 - - conda-forge::markdown=3.1.1 - - conda-forge::pymdown-extensions=6.0 - - conda-forge::pygments=2.5.2 - # Necessary tools - - bioconda::fastqc=0.11.8 - - bioconda::multiqc=1.7 - # Custom packages - - bioconda::star=2.7.1a # has to be for star index - - conda-forge::r-data.table=1.12.8 - - conda-forge::r-gplots=3.0.1.2 - - bioconda::bioconductor-edger=3.28.0 - - bioconda::fusion-report=2.1.3 - # Star-Fusion - - conda-forge::awscli=1.18.39 - - conda-forge::tar=1.32 \ No newline at end of file diff --git a/lib/NfcoreSchema.groovy b/lib/NfcoreSchema.groovy new file mode 100755 index 00000000..b3d092f8 --- /dev/null +++ b/lib/NfcoreSchema.groovy @@ -0,0 +1,529 @@ +// +// This file holds several functions used to perform JSON parameter validation, help and summary rendering for the nf-core pipeline template. +// + +import org.everit.json.schema.Schema +import org.everit.json.schema.loader.SchemaLoader +import org.everit.json.schema.ValidationException +import org.json.JSONObject +import org.json.JSONTokener +import org.json.JSONArray +import groovy.json.JsonSlurper +import groovy.json.JsonBuilder + +class NfcoreSchema { + + // + // Resolve Schema path relative to main workflow directory + // + public static String getSchemaPath(workflow, schema_filename='nextflow_schema.json') { + return "${workflow.projectDir}/${schema_filename}" + } + + // + // Function to loop over all parameters defined in schema and check + // whether the given parameters adhere to the specifications + // + /* groovylint-disable-next-line UnusedPrivateMethodParameter */ + public static void validateParameters(workflow, params, log, schema_filename='nextflow_schema.json') { + def has_error = false + //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~// + // Check for nextflow core params and unexpected params + def json = new File(getSchemaPath(workflow, schema_filename=schema_filename)).text + def Map schemaParams = (Map) new JsonSlurper().parseText(json).get('definitions') + def nf_params = [ + // Options for base `nextflow` command + 'bg', + 'c', + 'C', + 'config', + 'd', + 'D', + 'dockerize', + 'h', + 'log', + 'q', + 'quiet', + 'syslog', + 'v', + 'version', + + // Options for `nextflow run` command + 'ansi', + 'ansi-log', + 'bg', + 'bucket-dir', + 'c', + 'cache', + 'config', + 'dsl2', + 'dump-channels', + 'dump-hashes', + 'E', + 'entry', + 'latest', + 'lib', + 'main-script', + 'N', + 'name', + 'offline', + 'params-file', + 'pi', + 'plugins', + 'poll-interval', + 'pool-size', + 'profile', + 'ps', + 'qs', + 'queue-size', + 'r', + 'resume', + 'revision', + 'stdin', + 'stub', + 'stub-run', + 'test', + 'w', + 'with-charliecloud', + 'with-conda', + 'with-dag', + 'with-docker', + 'with-mpi', + 'with-notification', + 'with-podman', + 'with-report', + 'with-singularity', + 'with-timeline', + 'with-tower', + 'with-trace', + 'with-weblog', + 'without-docker', + 'without-podman', + 'work-dir' + ] + def unexpectedParams = [] + + // Collect expected parameters from the schema + def expectedParams = [] + def enums = [:] + for (group in schemaParams) { + for (p in group.value['properties']) { + expectedParams.push(p.key) + if (group.value['properties'][p.key].containsKey('enum')) { + enums[p.key] = group.value['properties'][p.key]['enum'] + } + } + } + + for (specifiedParam in params.keySet()) { + // nextflow params + if (nf_params.contains(specifiedParam)) { + log.error "ERROR: You used a core Nextflow option with two hyphens: '--${specifiedParam}'. Please resubmit with '-${specifiedParam}'" + has_error = true + } + // unexpected params + def params_ignore = params.schema_ignore_params.split(',') + 'schema_ignore_params' + def expectedParamsLowerCase = expectedParams.collect{ it.replace("-", "").toLowerCase() } + def specifiedParamLowerCase = specifiedParam.replace("-", "").toLowerCase() + def isCamelCaseBug = (specifiedParam.contains("-") && !expectedParams.contains(specifiedParam) && expectedParamsLowerCase.contains(specifiedParamLowerCase)) + if (!expectedParams.contains(specifiedParam) && !params_ignore.contains(specifiedParam) && !isCamelCaseBug) { + // Temporarily remove camelCase/camel-case params #1035 + def unexpectedParamsLowerCase = unexpectedParams.collect{ it.replace("-", "").toLowerCase()} + if (!unexpectedParamsLowerCase.contains(specifiedParamLowerCase)){ + unexpectedParams.push(specifiedParam) + } + } + } + + //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~// + // Validate parameters against the schema + InputStream input_stream = new File(getSchemaPath(workflow, schema_filename=schema_filename)).newInputStream() + JSONObject raw_schema = new JSONObject(new JSONTokener(input_stream)) + + // Remove anything that's in params.schema_ignore_params + raw_schema = removeIgnoredParams(raw_schema, params) + + Schema schema = SchemaLoader.load(raw_schema) + + // Clean the parameters + def cleanedParams = cleanParameters(params) + + // Convert to JSONObject + def jsonParams = new JsonBuilder(cleanedParams) + JSONObject params_json = new JSONObject(jsonParams.toString()) + + // Validate + try { + schema.validate(params_json) + } catch (ValidationException e) { + println '' + log.error 'ERROR: Validation of pipeline parameters failed!' + JSONObject exceptionJSON = e.toJSON() + printExceptions(exceptionJSON, params_json, log, enums) + println '' + has_error = true + } + + // Check for unexpected parameters + if (unexpectedParams.size() > 0) { + Map colors = NfcoreTemplate.logColours(params.monochrome_logs) + println '' + def warn_msg = 'Found unexpected parameters:' + for (unexpectedParam in unexpectedParams) { + warn_msg = warn_msg + "\n* --${unexpectedParam}: ${params[unexpectedParam].toString()}" + } + log.warn warn_msg + log.info "- ${colors.dim}Ignore this warning: params.schema_ignore_params = \"${unexpectedParams.join(',')}\" ${colors.reset}" + println '' + } + + if (has_error) { + System.exit(1) + } + } + + // + // Beautify parameters for --help + // + public static String paramsHelp(workflow, params, command, schema_filename='nextflow_schema.json') { + Map colors = NfcoreTemplate.logColours(params.monochrome_logs) + Integer num_hidden = 0 + String output = '' + output += 'Typical pipeline command:\n\n' + output += " ${colors.cyan}${command}${colors.reset}\n\n" + Map params_map = paramsLoad(getSchemaPath(workflow, schema_filename=schema_filename)) + Integer max_chars = paramsMaxChars(params_map) + 1 + Integer desc_indent = max_chars + 14 + Integer dec_linewidth = 160 - desc_indent + for (group in params_map.keySet()) { + Integer num_params = 0 + String group_output = colors.underlined + colors.bold + group + colors.reset + '\n' + def group_params = params_map.get(group) // This gets the parameters of that particular group + for (param in group_params.keySet()) { + if (group_params.get(param).hidden && !params.show_hidden_params) { + num_hidden += 1 + continue; + } + def type = '[' + group_params.get(param).type + ']' + def description = group_params.get(param).description + def defaultValue = group_params.get(param).default != null ? " [default: " + group_params.get(param).default.toString() + "]" : '' + def description_default = description + colors.dim + defaultValue + colors.reset + // Wrap long description texts + // Loosely based on https://dzone.com/articles/groovy-plain-text-word-wrap + if (description_default.length() > dec_linewidth){ + List olines = [] + String oline = "" // " " * indent + description_default.split(" ").each() { wrd -> + if ((oline.size() + wrd.size()) <= dec_linewidth) { + oline += wrd + " " + } else { + olines += oline + oline = wrd + " " + } + } + olines += oline + description_default = olines.join("\n" + " " * desc_indent) + } + group_output += " --" + param.padRight(max_chars) + colors.dim + type.padRight(10) + colors.reset + description_default + '\n' + num_params += 1 + } + group_output += '\n' + if (num_params > 0){ + output += group_output + } + } + if (num_hidden > 0){ + output += colors.dim + "!! Hiding $num_hidden params, use --show_hidden_params to show them !!\n" + colors.reset + } + output += NfcoreTemplate.dashedLine(params.monochrome_logs) + return output + } + + // + // Groovy Map summarising parameters/workflow options used by the pipeline + // + public static LinkedHashMap paramsSummaryMap(workflow, params, schema_filename='nextflow_schema.json') { + // Get a selection of core Nextflow workflow options + def Map workflow_summary = [:] + if (workflow.revision) { + workflow_summary['revision'] = workflow.revision + } + workflow_summary['runName'] = workflow.runName + if (workflow.containerEngine) { + workflow_summary['containerEngine'] = workflow.containerEngine + } + if (workflow.container) { + workflow_summary['container'] = workflow.container + } + workflow_summary['launchDir'] = workflow.launchDir + workflow_summary['workDir'] = workflow.workDir + workflow_summary['projectDir'] = workflow.projectDir + workflow_summary['userName'] = workflow.userName + workflow_summary['profile'] = workflow.profile + workflow_summary['configFiles'] = workflow.configFiles.join(', ') + + // Get pipeline parameters defined in JSON Schema + def Map params_summary = [:] + def params_map = paramsLoad(getSchemaPath(workflow, schema_filename=schema_filename)) + for (group in params_map.keySet()) { + def sub_params = new LinkedHashMap() + def group_params = params_map.get(group) // This gets the parameters of that particular group + for (param in group_params.keySet()) { + if (params.containsKey(param)) { + def params_value = params.get(param) + def schema_value = group_params.get(param).default + def param_type = group_params.get(param).type + if (schema_value != null) { + if (param_type == 'string') { + if (schema_value.contains('$projectDir') || schema_value.contains('${projectDir}')) { + def sub_string = schema_value.replace('\$projectDir', '') + sub_string = sub_string.replace('\${projectDir}', '') + if (params_value.contains(sub_string)) { + schema_value = params_value + } + } + if (schema_value.contains('$params.outdir') || schema_value.contains('${params.outdir}')) { + def sub_string = schema_value.replace('\$params.outdir', '') + sub_string = sub_string.replace('\${params.outdir}', '') + if ("${params.outdir}${sub_string}" == params_value) { + schema_value = params_value + } + } + } + } + + // We have a default in the schema, and this isn't it + if (schema_value != null && params_value != schema_value) { + sub_params.put(param, params_value) + } + // No default in the schema, and this isn't empty + else if (schema_value == null && params_value != "" && params_value != null && params_value != false) { + sub_params.put(param, params_value) + } + } + } + params_summary.put(group, sub_params) + } + return [ 'Core Nextflow options' : workflow_summary ] << params_summary + } + + // + // Beautify parameters for summary and return as string + // + public static String paramsSummaryLog(workflow, params) { + Map colors = NfcoreTemplate.logColours(params.monochrome_logs) + String output = '' + def params_map = paramsSummaryMap(workflow, params) + def max_chars = paramsMaxChars(params_map) + for (group in params_map.keySet()) { + def group_params = params_map.get(group) // This gets the parameters of that particular group + if (group_params) { + output += colors.bold + group + colors.reset + '\n' + for (param in group_params.keySet()) { + output += " " + colors.blue + param.padRight(max_chars) + ": " + colors.green + group_params.get(param) + colors.reset + '\n' + } + output += '\n' + } + } + output += "!! Only displaying parameters that differ from the pipeline defaults !!\n" + output += NfcoreTemplate.dashedLine(params.monochrome_logs) + return output + } + + // + // Loop over nested exceptions and print the causingException + // + private static void printExceptions(ex_json, params_json, log, enums, limit=5) { + def causingExceptions = ex_json['causingExceptions'] + if (causingExceptions.length() == 0) { + def m = ex_json['message'] =~ /required key \[([^\]]+)\] not found/ + // Missing required param + if (m.matches()) { + log.error "* Missing required parameter: --${m[0][1]}" + } + // Other base-level error + else if (ex_json['pointerToViolation'] == '#') { + log.error "* ${ex_json['message']}" + } + // Error with specific param + else { + def param = ex_json['pointerToViolation'] - ~/^#\// + def param_val = params_json[param].toString() + if (enums.containsKey(param)) { + def error_msg = "* --${param}: '${param_val}' is not a valid choice (Available choices" + if (enums[param].size() > limit) { + log.error "${error_msg} (${limit} of ${enums[param].size()}): ${enums[param][0..limit-1].join(', ')}, ... )" + } else { + log.error "${error_msg}: ${enums[param].join(', ')})" + } + } else { + log.error "* --${param}: ${ex_json['message']} (${param_val})" + } + } + } + for (ex in causingExceptions) { + printExceptions(ex, params_json, log, enums) + } + } + + // + // Remove an element from a JSONArray + // + private static JSONArray removeElement(json_array, element) { + def list = [] + int len = json_array.length() + for (int i=0;i + if(raw_schema.keySet().contains('definitions')){ + raw_schema.definitions.each { definition -> + for (key in definition.keySet()){ + if (definition[key].get("properties").keySet().contains(ignore_param)){ + // Remove the param to ignore + definition[key].get("properties").remove(ignore_param) + // If the param was required, change this + if (definition[key].has("required")) { + def cleaned_required = removeElement(definition[key].required, ignore_param) + definition[key].put("required", cleaned_required) + } + } + } + } + } + if(raw_schema.keySet().contains('properties') && raw_schema.get('properties').keySet().contains(ignore_param)) { + raw_schema.get("properties").remove(ignore_param) + } + if(raw_schema.keySet().contains('required') && raw_schema.required.contains(ignore_param)) { + def cleaned_required = removeElement(raw_schema.required, ignore_param) + raw_schema.put("required", cleaned_required) + } + } + return raw_schema + } + + // + // Clean and check parameters relative to Nextflow native classes + // + private static Map cleanParameters(params) { + def new_params = params.getClass().newInstance(params) + for (p in params) { + // remove anything evaluating to false + if (!p['value']) { + new_params.remove(p.key) + } + // Cast MemoryUnit to String + if (p['value'].getClass() == nextflow.util.MemoryUnit) { + new_params.replace(p.key, p['value'].toString()) + } + // Cast Duration to String + if (p['value'].getClass() == nextflow.util.Duration) { + new_params.replace(p.key, p['value'].toString().replaceFirst(/d(?!\S)/, "day")) + } + // Cast LinkedHashMap to String + if (p['value'].getClass() == LinkedHashMap) { + new_params.replace(p.key, p['value'].toString()) + } + } + return new_params + } + + // + // This function tries to read a JSON params file + // + private static LinkedHashMap paramsLoad(String json_schema) { + def params_map = new LinkedHashMap() + try { + params_map = paramsRead(json_schema) + } catch (Exception e) { + println "Could not read parameters settings from JSON. $e" + params_map = new LinkedHashMap() + } + return params_map + } + + // + // Method to actually read in JSON file using Groovy. + // Group (as Key), values are all parameters + // - Parameter1 as Key, Description as Value + // - Parameter2 as Key, Description as Value + // .... + // Group + // - + private static LinkedHashMap paramsRead(String json_schema) throws Exception { + def json = new File(json_schema).text + def Map schema_definitions = (Map) new JsonSlurper().parseText(json).get('definitions') + def Map schema_properties = (Map) new JsonSlurper().parseText(json).get('properties') + /* Tree looks like this in nf-core schema + * definitions <- this is what the first get('definitions') gets us + group 1 + title + description + properties + parameter 1 + type + description + parameter 2 + type + description + group 2 + title + description + properties + parameter 1 + type + description + * properties <- parameters can also be ungrouped, outside of definitions + parameter 1 + type + description + */ + + // Grouped params + def params_map = new LinkedHashMap() + schema_definitions.each { key, val -> + def Map group = schema_definitions."$key".properties // Gets the property object of the group + def title = schema_definitions."$key".title + def sub_params = new LinkedHashMap() + group.each { innerkey, value -> + sub_params.put(innerkey, value) + } + params_map.put(title, sub_params) + } + + // Ungrouped params + def ungrouped_params = new LinkedHashMap() + schema_properties.each { innerkey, value -> + ungrouped_params.put(innerkey, value) + } + params_map.put("Other parameters", ungrouped_params) + + return params_map + } + + // + // Get maximum number of characters across all parameter names + // + private static Integer paramsMaxChars(params_map) { + Integer max_chars = 0 + for (group in params_map.keySet()) { + def group_params = params_map.get(group) // This gets the parameters of that particular group + for (param in group_params.keySet()) { + if (param.size() > max_chars) { + max_chars = param.size() + } + } + } + return max_chars + } +} diff --git a/lib/NfcoreTemplate.groovy b/lib/NfcoreTemplate.groovy new file mode 100755 index 00000000..2fc0a9b9 --- /dev/null +++ b/lib/NfcoreTemplate.groovy @@ -0,0 +1,258 @@ +// +// This file holds several functions used within the nf-core pipeline template. +// + +import org.yaml.snakeyaml.Yaml + +class NfcoreTemplate { + + // + // Check AWS Batch related parameters have been specified correctly + // + public static void awsBatch(workflow, params) { + if (workflow.profile.contains('awsbatch')) { + // Check params.awsqueue and params.awsregion have been set if running on AWSBatch + assert (params.awsqueue && params.awsregion) : "Specify correct --awsqueue and --awsregion parameters on AWSBatch!" + // Check outdir paths to be S3 buckets if running on AWSBatch + assert params.outdir.startsWith('s3:') : "Outdir not on S3 - specify S3 Bucket to run on AWSBatch!" + } + } + + // + // Warn if a -profile or Nextflow config has not been provided to run the pipeline + // + public static void checkConfigProvided(workflow, log) { + if (workflow.profile == 'standard' && workflow.configFiles.size() <= 1) { + log.warn "[$workflow.manifest.name] You are attempting to run the pipeline without any custom configuration!\n\n" + + "This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" + + " (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" + + " (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" + + " (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" + + "Please refer to the quick start section and usage docs for the pipeline.\n " + } + } + + // + // Construct and send completion email + // + public static void email(workflow, params, summary_params, projectDir, log, multiqc_report=[]) { + + // Set up the e-mail variables + def subject = "[$workflow.manifest.name] Successful: $workflow.runName" + if (!workflow.success) { + subject = "[$workflow.manifest.name] FAILED: $workflow.runName" + } + + def summary = [:] + for (group in summary_params.keySet()) { + summary << summary_params[group] + } + + def misc_fields = [:] + misc_fields['Date Started'] = workflow.start + misc_fields['Date Completed'] = workflow.complete + misc_fields['Pipeline script file path'] = workflow.scriptFile + misc_fields['Pipeline script hash ID'] = workflow.scriptId + if (workflow.repository) misc_fields['Pipeline repository Git URL'] = workflow.repository + if (workflow.commitId) misc_fields['Pipeline repository Git Commit'] = workflow.commitId + if (workflow.revision) misc_fields['Pipeline Git branch/tag'] = workflow.revision + misc_fields['Nextflow Version'] = workflow.nextflow.version + misc_fields['Nextflow Build'] = workflow.nextflow.build + misc_fields['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp + + def email_fields = [:] + email_fields['version'] = workflow.manifest.version + email_fields['runName'] = workflow.runName + email_fields['success'] = workflow.success + email_fields['dateComplete'] = workflow.complete + email_fields['duration'] = workflow.duration + email_fields['exitStatus'] = workflow.exitStatus + email_fields['errorMessage'] = (workflow.errorMessage ?: 'None') + email_fields['errorReport'] = (workflow.errorReport ?: 'None') + email_fields['commandLine'] = workflow.commandLine + email_fields['projectDir'] = workflow.projectDir + email_fields['summary'] = summary << misc_fields + + // On success try attach the multiqc report + def mqc_report = null + try { + if (workflow.success) { + mqc_report = multiqc_report.getVal() + if (mqc_report.getClass() == ArrayList && mqc_report.size() >= 1) { + if (mqc_report.size() > 1) { + log.warn "[$workflow.manifest.name] Found multiple reports from process 'MULTIQC', will use only one" + } + mqc_report = mqc_report[0] + } + } + } catch (all) { + if (multiqc_report) { + log.warn "[$workflow.manifest.name] Could not attach MultiQC report to summary email" + } + } + + // Check if we are only sending emails on failure + def email_address = params.email + if (!params.email && params.email_on_fail && !workflow.success) { + email_address = params.email_on_fail + } + + // Render the TXT template + def engine = new groovy.text.GStringTemplateEngine() + def tf = new File("$projectDir/assets/email_template.txt") + def txt_template = engine.createTemplate(tf).make(email_fields) + def email_txt = txt_template.toString() + + // Render the HTML template + def hf = new File("$projectDir/assets/email_template.html") + def html_template = engine.createTemplate(hf).make(email_fields) + def email_html = html_template.toString() + + // Render the sendmail template + def max_multiqc_email_size = params.max_multiqc_email_size as nextflow.util.MemoryUnit + def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "$projectDir", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes() ] + def sf = new File("$projectDir/assets/sendmail_template.txt") + def sendmail_template = engine.createTemplate(sf).make(smail_fields) + def sendmail_html = sendmail_template.toString() + + // Send the HTML e-mail + Map colors = logColours(params.monochrome_logs) + if (email_address) { + try { + if (params.plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') } + // Try to send HTML e-mail using sendmail + [ 'sendmail', '-t' ].execute() << sendmail_html + log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (sendmail)-" + } catch (all) { + // Catch failures and try with plaintext + def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ] + if ( mqc_report.size() <= max_multiqc_email_size.toBytes() ) { + mail_cmd += [ '-A', mqc_report ] + } + mail_cmd.execute() << email_html + log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (mail)-" + } + } + + // Write summary e-mail HTML to a file + def output_d = new File("${params.outdir}/pipeline_info/") + if (!output_d.exists()) { + output_d.mkdirs() + } + def output_hf = new File(output_d, "pipeline_report.html") + output_hf.withWriter { w -> w << email_html } + def output_tf = new File(output_d, "pipeline_report.txt") + output_tf.withWriter { w -> w << email_txt } + } + + // + // Print pipeline summary on completion + // + public static void summary(workflow, params, log) { + Map colors = logColours(params.monochrome_logs) + if (workflow.success) { + if (workflow.stats.ignoredCount == 0) { + log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Pipeline completed successfully${colors.reset}-" + } else { + log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed successfully, but with errored process(es) ${colors.reset}-" + } + } else { + log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed with errors${colors.reset}-" + } + } + + // + // ANSII Colours used for terminal logging + // + public static Map logColours(Boolean monochrome_logs) { + Map colorcodes = [:] + + // Reset / Meta + colorcodes['reset'] = monochrome_logs ? '' : "\033[0m" + colorcodes['bold'] = monochrome_logs ? '' : "\033[1m" + colorcodes['dim'] = monochrome_logs ? '' : "\033[2m" + colorcodes['underlined'] = monochrome_logs ? '' : "\033[4m" + colorcodes['blink'] = monochrome_logs ? '' : "\033[5m" + colorcodes['reverse'] = monochrome_logs ? '' : "\033[7m" + colorcodes['hidden'] = monochrome_logs ? '' : "\033[8m" + + // Regular Colors + colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" + colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" + colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" + colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" + colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" + colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" + colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" + colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" + + // Bold + colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" + colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" + colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" + colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" + colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" + colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" + colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" + colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" + + // Underline + colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" + colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" + colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" + colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" + colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" + colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" + colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" + colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" + + // High Intensity + colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" + colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" + colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" + colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" + colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" + colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" + colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" + colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" + + // Bold High Intensity + colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" + colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" + colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" + colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" + colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" + colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" + colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" + colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" + + return colorcodes + } + + // + // Does what is says on the tin + // + public static String dashedLine(monochrome_logs) { + Map colors = logColours(monochrome_logs) + return "-${colors.dim}----------------------------------------------------${colors.reset}-" + } + + // + // nf-core logo + // + public static String logo(workflow, monochrome_logs) { + Map colors = logColours(monochrome_logs) + String.format( + """\n + ${dashedLine(monochrome_logs)} + ${colors.green},--.${colors.black}/${colors.green},-.${colors.reset} + ${colors.blue} ___ __ __ __ ___ ${colors.green}/,-._.--~\'${colors.reset} + ${colors.blue} |\\ | |__ __ / ` / \\ |__) |__ ${colors.yellow}} {${colors.reset} + ${colors.blue} | \\| | \\__, \\__/ | \\ |___ ${colors.green}\\`-._,-`-,${colors.reset} + ${colors.green}`._,._,\'${colors.reset} + ${colors.purple} ${workflow.manifest.name} v${workflow.manifest.version}${colors.reset} + ${dashedLine(monochrome_logs)} + """.stripIndent() + ) + } +} diff --git a/lib/Utils.groovy b/lib/Utils.groovy new file mode 100755 index 00000000..28567bd7 --- /dev/null +++ b/lib/Utils.groovy @@ -0,0 +1,40 @@ +// +// This file holds several Groovy functions that could be useful for any Nextflow pipeline +// + +import org.yaml.snakeyaml.Yaml + +class Utils { + + // + // When running with -profile conda, warn if channels have not been set-up appropriately + // + public static void checkCondaChannels(log) { + Yaml parser = new Yaml() + def channels = [] + try { + def config = parser.load("conda config --show channels".execute().text) + channels = config.channels + } catch(NullPointerException | IOException e) { + log.warn "Could not verify conda channel configuration." + return + } + + // Check that all channels are present + def required_channels = ['conda-forge', 'bioconda', 'defaults'] + def conda_check_failed = !required_channels.every { ch -> ch in channels } + + // Check that they are in the right order + conda_check_failed |= !(channels.indexOf('conda-forge') < channels.indexOf('bioconda')) + conda_check_failed |= !(channels.indexOf('bioconda') < channels.indexOf('defaults')) + + if (conda_check_failed) { + log.warn "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" + + " There is a problem with your Conda configuration!\n\n" + + " You will need to set-up the conda-forge and bioconda channels correctly.\n" + + " Please refer to https://bioconda.github.io/user/install.html#set-up-channels\n" + + " NB: The order of the channels matters!\n" + + "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" + } + } +} diff --git a/lib/WorkflowMain.groovy b/lib/WorkflowMain.groovy new file mode 100755 index 00000000..3266d8d4 --- /dev/null +++ b/lib/WorkflowMain.groovy @@ -0,0 +1,93 @@ +// +// This file holds several functions specific to the main.nf workflow in the nf-core/rnafusion pipeline +// + +class WorkflowMain { + + // + // Citation string for pipeline + // + public static String citation(workflow) { + return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + + "* The pipeline\n" + + " https://doi.org/10.5281/zenodo.151721952\n\n" + + "* The nf-core framework\n" + + " https://doi.org/10.1038/s41587-020-0439-x\n\n" + + "* Software dependencies\n" + + " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" + } + + // + // Print help to screen if required + // + public static String help(workflow, params, log) { + def command = "nextflow run ${workflow.manifest.name} --input samplesheet.csv --genome GRCh38 -profile docker" + def help_string = '' + help_string += NfcoreTemplate.logo(workflow, params.monochrome_logs) + help_string += NfcoreSchema.paramsHelp(workflow, params, command) + help_string += '\n' + citation(workflow) + '\n' + help_string += NfcoreTemplate.dashedLine(params.monochrome_logs) + return help_string + } + + // + // Print parameter summary log to screen + // + public static String paramsSummaryLog(workflow, params, log) { + def summary_log = '' + summary_log += NfcoreTemplate.logo(workflow, params.monochrome_logs) + summary_log += NfcoreSchema.paramsSummaryLog(workflow, params) + summary_log += '\n' + citation(workflow) + '\n' + summary_log += NfcoreTemplate.dashedLine(params.monochrome_logs) + return summary_log + } + + // + // Validate parameters and print summary to screen + // + public static void initialise(workflow, params, log) { + // Print help to screen if required + if (params.help) { + log.info help(workflow, params, log) + System.exit(0) + } + + // Validate workflow parameters via the JSON schema + if (params.validate_params) { + NfcoreSchema.validateParameters(workflow, params, log) + } + + // Print parameter summary log to screen + log.info paramsSummaryLog(workflow, params, log) + + // Check that a -profile or Nextflow config has been provided to run the pipeline + NfcoreTemplate.checkConfigProvided(workflow, log) + + // Check that conda channels are set-up correctly + if (params.enable_conda) { + Utils.checkCondaChannels(log) + } + + // Check AWS batch settings + NfcoreTemplate.awsBatch(workflow, params) + + // Check input has been provided + if (!params.input) { + log.error "Please provide an input samplesheet to the pipeline e.g. '--input samplesheet.csv'" + System.exit(1) + } + } + + // + // Get attribute from genome config file e.g. fasta + // + public static String getGenomeAttribute(params, attribute) { + def val = '' + if (params.genomes && params.genome && params.genomes.containsKey(params.genome)) { + if (params.genomes[ params.genome ].containsKey(attribute)) { + val = params.genomes[ params.genome ][ attribute ] + } + } + return val + } +} diff --git a/lib/WorkflowRnafusion.groovy b/lib/WorkflowRnafusion.groovy new file mode 100755 index 00000000..74705064 --- /dev/null +++ b/lib/WorkflowRnafusion.groovy @@ -0,0 +1,59 @@ +// +// This file holds several functions specific to the workflow/rnafusion.nf in the nf-core/rnafusion pipeline +// + +class WorkflowRnafusion { + + // + // Check and validate parameters + // + public static void initialise(params, log) { + genomeExistsError(params, log) + + if (!params.fasta) { + log.error "Genome fasta file not specified with e.g. '--fasta genome.fa' or via a detectable config file." + System.exit(1) + } + } + + // + // Get workflow summary for MultiQC + // + public static String paramsSummaryMultiqc(workflow, summary) { + String summary_section = '' + for (group in summary.keySet()) { + def group_params = summary.get(group) // This gets the parameters of that particular group + if (group_params) { + summary_section += "

$group

\n" + summary_section += "
\n" + for (param in group_params.keySet()) { + summary_section += "
$param
${group_params.get(param) ?: 'N/A'}
\n" + } + summary_section += "
\n" + } + } + + String yaml_file_text = "id: '${workflow.manifest.name.replace('/','-')}-summary'\n" + yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" + yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" + yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" + yaml_file_text += "plot_type: 'html'\n" + yaml_file_text += "data: |\n" + yaml_file_text += "${summary_section}" + return yaml_file_text + } + + // + // Exit pipeline if incorrect --genome key provided + // + private static void genomeExistsError(params, log) { + if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) { + log.error "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" + + " Genome '${params.genome}' not found in any config files provided to the pipeline.\n" + + " Currently, the available genome keys are:\n" + + " ${params.genomes.keySet().join(", ")}\n" + + "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" + System.exit(1) + } + } +} diff --git a/lib/nfcore_external_java_deps.jar b/lib/nfcore_external_java_deps.jar new file mode 100644 index 00000000..805c8bb5 Binary files /dev/null and b/lib/nfcore_external_java_deps.jar differ diff --git a/main.nf b/main.nf index b4f9151c..2408022e 100644 --- a/main.nf +++ b/main.nf @@ -1,1007 +1,86 @@ #!/usr/bin/env nextflow /* -================================================================================ - nf-core/rnafusion -================================================================================ -nf-core/rnafusion: - RNA-seq analysis pipeline for detection gene-fusions --------------------------------------------------------------------------------- - @Homepage - https://nf-co.re/rnafusion --------------------------------------------------------------------------------- - @Documentation - https://nf-co.re/rnafusion/docs --------------------------------------------------------------------------------- - @Repository - https://github.com/nf-core/rnafusion --------------------------------------------------------------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + nf-core/rnafusion +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Github : https://github.com/nf-core/rnafusion + Website: https://nf-co.re/rnafusion + Slack : https://nfcore.slack.com/channels/rnafusion +---------------------------------------------------------------------------------------- */ -def helpMessage() { - log.info nfcoreHeader() - log.info""" - - Usage: - - The typical command for running the pipeline is as follows: - - nextflow run nf-core/rnafusion --reads '*_R{1,2}.fastq.gz' -profile docker - - Mandatory arguments: - --reads [file] Path to input data (must be surrounded with quotes) - -profile [str] Configuration profile to use. Can use multiple (comma separated) - Available: docker, singularity, test, awsbatch, and more - --reference_path [str] Path to reference folder (includes fasta, gtf, fusion tool ref ...) - - Tool flags: - --arriba [bool] Run Arriba - --arriba_opt [str] Specify extra parameters for Arriba - --ericscript [bool] Run Ericscript - --fusioncatcher [bool] Run FusionCatcher - --fusioncatcher_opt [srt] Specify extra parameters for FusionCatcher - --fusion_report_opt [str] Specify extra parameters for fusion-report - --pizzly [bool] Run Pizzly - --pizzly_k [int] Number of k-mers. Deafult 31 - --squid [bool] Run Squid - --star_fusion [bool] Run STAR-Fusion - --star_fusion_opt [str] Specify extra parameters for STAR-Fusion - - Visualization flags: - --arriba_vis [bool] Generate a PDF visualization per detected fusion - --fusion_inspector [bool] Run Fusion-Inspector - --fusion_inspector_opt [str] Specify extra parameters for Fusion-Inspector - - References If not specified in the configuration file or you wish to overwrite any of the references. - --arriba_ref [file] Path to Arriba reference - --databases [path] Database path for fusion-report - --ericscript_ref [file] Path to Ericscript reference - --fasta [file] Path to fasta reference - --fusioncatcher_ref [file] Path to Fusioncatcher reference - --gtf [file] Path to GTF annotation - --star_index [file] Path to STAR-Index reference - --star_fusion_ref [file] Path to STAR-Fusion reference - --transcript [file] Path to transcript - - Options: - --read_length [int] Length of the reads. Default: 100 - --single_end [bool] Specifies that the input is single-end reads - - Other Options: - --debug [bool] Flag to run only specific fusion tool/s and not the whole pipeline. Only works on tool flags. - --outdir [file] The output directory where the results will be saved - --email [email] Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits - --email_on_fail [email] Same as --email, except only send mail if the workflow is not successful - --max_multiqc_email_size [str] Theshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB) - -name [str] Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic - - AWSBatch options: - --awsqueue [str] The AWSBatch JobQueue that needs to be set when running on AWSBatch - --awsregion [str] The AWS Region for your AWS Batch job to run on - --awscli [str] Path to the AWS CLI tool - """.stripIndent() -} +nextflow.enable.dsl = 2 /* -================================================================================ - SET UP CONFIGURATION VARIABLES -================================================================================ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + GENOME PARAMETER VALUES +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -// Show help message -if (params.help) exit 0, helpMessage() - -running_tools = [] -visualization_tools = [] -reference = [ - arriba: false, - arriba_vis: false, - ericscript: false, - fusion_inspector: false, - fusioncatcher: false, - star_fusion: false -] - -// Check if genome exists in the config file -if (params.genome && !params.genomes.containsKey(params.genome)) { - exit 1, "The provided genome '${params.genome}' is not available in the genomes file. Currently the available genomes are ${params.genomes.keySet().join(", ")}" -} - -if (!Channel.fromPath(params.genomes_base, checkIfExists: true)) {exit 1, "Directory ${params.genomes_base} doesn't exist."} - -params.arriba_ref = params.genome ? params.genomes[params.genome].arriba_ref ?: null : null -params.databases = params.genome ? params.genomes[params.genome].databases ?: null : null -params.ericscript_ref = params.genome ? params.genomes[params.genome].ericscript_ref ?: null : null -params.fasta = params.genome ? params.genomes[params.genome].fasta ?: null : null -params.fusioncatcher_ref = params.genome ? params.genomes[params.genome].fusioncatcher_ref ?: null : null -params.gtf = params.genome ? params.genomes[params.genome].gtf ?: null : null -params.star_fusion_ref = params.genome ? params.genomes[params.genome].star_fusion_ref ?: null : null -params.transcript = params.genome ? params.genomes[params.genome].transcript ?: null : null - -ch_fasta = Channel.value(file(params.fasta)).ifEmpty{exit 1, "Fasta file not found: ${params.fasta}"} -ch_gtf = Channel.value(file(params.gtf)).ifEmpty{exit 1, "GTF annotation file not found: ${params.gtf}"} -ch_transcript = Channel.value(file(params.transcript)).ifEmpty{exit 1, "Transcript file not found: ${params.transcript}"} - -if (!params.star_index && (!params.fasta && !params.gtf)) exit 1, "Either specify STAR-INDEX or Fasta and GTF!" - -if (!params.databases) exit 1, "Database path for fusion-report has to be specified!" - -if (params.arriba) { - running_tools.add("Arriba") - reference.arriba = Channel.value(file(params.arriba_ref)).ifEmpty{exit 1, "Arriba reference directory not found!"} -} - -if (params.arriba_vis) { - visualization_tools.add("Arriba") - reference.arriba_vis = Channel.value(file(params.arriba_ref)).ifEmpty{exit 1, "Arriba visualization reference directory not found!"} -} - -if (params.ericscript) { - running_tools.add("EricScript") - reference.ericscript = Channel.value(file(params.ericscript_ref)).ifEmpty{exit 1, "EricsSript reference not found!"} -} - -if (params.fusioncatcher) { - running_tools.add("Fusioncatcher") - reference.fusioncatcher = Channel.value(file(params.fusioncatcher_ref)).ifEmpty{exit 1, "Fusioncatcher data directory not found!"} -} - -if (params.fusion_inspector) { - visualization_tools.add("Fusion-Inspector") - reference.fusion_inspector = Channel.value(file(params.star_fusion_ref)).ifEmpty{exit 1, "Fusion-Inspector reference not found" } -} - -if (params.pizzly) running_tools.add("Pizzly") - -if (params.star_fusion) { - running_tools.add("STAR-Fusion") - reference.star_fusion = Channel.value(file(params.star_fusion_ref)).ifEmpty{exit 1, "Star-Fusion reference directory not found!"} -} - -if (params.squid) running_tools.add("Squid") - -// Has the run name been specified by the user? -// this has the bonus effect of catching both -name and --name -custom_runName = params.name -if (!(workflow.runName ==~ /[a-z]+_[a-z]+/)) { - custom_runName = workflow.runName -} - -if (workflow.profile.contains('awsbatch')) { - // AWSBatch sanity checking - if (!params.awsqueue || !params.awsregion) exit 1, "Specify correct --awsqueue and --awsregion parameters on AWSBatch!" - // Check outdir paths to be S3 buckets if running on AWSBatch - // related: https://github.com/nextflow-io/nextflow/issues/813 - if (!params.outdir.startsWith('s3:')) exit 1, "Outdir not on S3 - specify S3 Bucket to run on AWSBatch!" - // Prevent trace files to be stored on S3 since S3 does not support rolling files. - if (params.tracedir.startsWith('s3:')) exit 1, "Specify a local tracedir or run without trace! S3 cannot be used for tracefiles." -} - -// Stage config files -ch_multiqc_config = file("$baseDir/assets/multiqc_config.yaml", checkIfExists: true) -ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config, checkIfExists: true) : Channel.empty() -ch_output_docs = file("$baseDir/docs/output.md", checkIfExists: true) - -/* - * Create a channel for input read files - */ -if(params.readPaths) { - if(params.single_end) { - Channel.from(params.readPaths) - .map { row -> [ row[0], [file(row[1][0])]] } - .ifEmpty{exit 1, "params.readPaths was empty - no input files supplied" } - .into{read_files_arriba; read_files_ericscript; ch_read_files_fastqc; read_files_fusion_inspector; read_files_fusioncatcher; read_files_multiqc; read_files_pizzly; read_files_squid; read_files_star_fusion; read_files_summary} - } else { - Channel.from(params.readPaths) - .map { row -> [ row[0], [file(row[1][0]), file(row[1][1])]] } - .ifEmpty{exit 1, "params.readPaths was empty - no input files supplied" } - .into{read_files_arriba; read_files_ericscript; ch_read_files_fastqc; read_files_fusion_inspector; read_files_fusioncatcher; read_files_multiqc; read_files_pizzly; read_files_squid; read_files_star_fusion; read_files_summary} - } -} else { - Channel.fromFilePairs( params.reads, size: params.single_end ? 1 : 2 ) - .ifEmpty{exit 1, "Cannot find any reads matching: ${params.reads}\nNB: Path needs to be enclosed in quotes!\nIf this is single-end data, please specify --single_end on the command line." } - .into{read_files_arriba; read_files_ericscript; ch_read_files_fastqc; read_files_fusion_inspector; read_files_fusioncatcher; read_files_multiqc; read_files_pizzly; read_files_squid; read_files_star_fusion; read_files_summary} -} +params.fasta = WorkflowMain.getGenomeAttribute(params, 'fasta') +params.gtf = WorkflowMain.getGenomeAttribute(params, 'gtf') +params.chrgtf = WorkflowMain.getGenomeAttribute(params, 'chrgtf') +params.transcript = WorkflowMain.getGenomeAttribute(params, 'transcript') +params.refflat = WorkflowMain.getGenomeAttribute(params, 'refflat') /* -================================================================================ - PRINTING SUMMARY -================================================================================ +======================================================================================== + PARAMETER VALUES +======================================================================================== */ -// Header log info -log.info nfcoreHeader() -def summary = [:] -if(workflow.revision) summary['Pipeline Release'] = workflow.revision -summary['Run Name'] = custom_runName ?: workflow.runName -summary['Reads'] = params.reads -summary['Fasta Ref'] = params.fasta -summary['GTF Ref'] = params.gtf -summary['STAR Index'] = params.star_index ? params.star_index : 'Not specified, building' -summary['Fusion tools'] = running_tools.size() == 0 ? 'None' : running_tools.join(", ") -summary['Visualization tools'] = visualization_tools.size() == 0 ? 'None': visualization_tools.join(", ") -summary['Data Type'] = params.single_end ? 'Single-End' : 'Paired-End' -summary['Max Resources'] = "$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job" -if(workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" -summary['Output dir'] = params.outdir -summary['Launch dir'] = workflow.launchDir -summary['Working dir'] = workflow.workDir -summary['Script dir'] = workflow.projectDir -summary['User'] = workflow.userName -if(workflow.profile == 'awsbatch') { - summary['AWS Region'] = params.awsregion - summary['AWS Queue'] = params.awsqueue -} -summary['Config Profile'] = workflow.profile -if (params.config_profile_description) summary['Config Description'] = params.config_profile_description -if (params.config_profile_contact) summary['Config Contact'] = params.config_profile_contact -if (params.config_profile_url) summary['Config URL'] = params.config_profile_url -if (params.email || params.email_on_fail) { - summary['E-mail Address'] = params.email - summary['E-mail on failure'] = params.email_on_fail - summary['MultiQC maxsize'] = params.max_multiqc_email_size -} -log.info summary.collect { k,v -> "${k.padRight(18)}: $v" }.join("\n") -log.info "-\033[2m--------------------------------------------------\033[0m-" - -// Check the hostnames against configured profiles -checkHostname() - -Channel.from(summary.collect{ [it.key, it.value] }) - .map { k,v -> "
$k
${v ?: 'N/A'}
" } - .reduce { a, b -> return [a, b].join("\n ") } - .map { x -> """ - id: 'nf-core-rnafusion-summary' - description: " - this information is collected when the pipeline is started." - section_name: 'nf-core/rnafusion Workflow Summary' - section_href: 'https://github.com/nf-core/rnafusion' - plot_type: 'html' - data: | -
- $x -
- """.stripIndent() } - .set { ch_workflow_summary } /* -================================================================================ - PREPROCESSING -================================================================================ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + VALIDATE & PRINT PARAMETER SUMMARY +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -/* - * Build STAR index - */ - -process build_star_index { - tag "${fasta}-${gtf}" - label 'process_medium' - - publishDir params.outdir, mode: 'copy' - - input: - file(fasta) from ch_fasta - file(gtf) from ch_gtf - - output: - file("star-index") into star_index - - when: !(params.star_index) - - script: - def avail_mem = task.memory ? "--limitGenomeGenerateRAM ${task.memory.toBytes() - 100000000}" : '' - """ - mkdir star-index - STAR \\ - --runMode genomeGenerate \\ - --runThreadN ${task.cpus} \\ - --sjdbGTFfile ${gtf} \\ - --sjdbOverhang ${params.read_length - 1} \\ - --genomeDir star-index/ \\ - --genomeFastaFiles ${fasta} \\ - ${avail_mem} - """ -} - -ch_star_index = params.star_index ? Channel.value(file(params.star_index)).ifEmpty{exit 1, "STAR index not found: ${params.star_index}" } : star_index - -ch_star_index = ch_star_index.dump(tag:'ch_star_index') +WorkflowMain.initialise(workflow, params, log) /* -================================================================================ - FUSION PIPELINE -================================================================================ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + NAMED WORKFLOW FOR PIPELINE +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -/* - * Arriba - */ -process arriba { - tag "${sample}" - label 'process_medium' - - publishDir "${params.outdir}/tools/Arriba/${sample}", mode: 'copy' - - input: - set val(sample), file(reads) from read_files_arriba - file(reference) from reference.arriba - file(star_index) from ch_star_index - file(fasta) from ch_fasta - file(gtf) from ch_gtf - - output: - set val(sample), file("${sample}_arriba.tsv") optional true into arriba_fusions_summary, arriba_tsv - set val(sample), file("${sample}_arriba.bam") optional true into arriba_bam - file("*.{tsv,txt}") into arriba_output +include { BUILD_REFERENCES } from './workflows/build_references' +include { RNAFUSION } from './workflows/rnafusion' - when: params.arriba && (!params.single_end || params.debug) - script: - def extra_params = params.arriba_opt ? params.arriba_opt : '' - """ - STAR \\ - --genomeDir ${star_index} \\ - --runThreadN ${task.cpus} \\ - --readFilesIn ${reads} \\ - --outStd BAM_Unsorted \\ - --outSAMtype BAM Unsorted \\ - --outSAMunmapped Within \\ - --outBAMcompression 0 \\ - --outFilterMultimapNmax 1 \\ - --outFilterMismatchNmax 3 \\ - --chimSegmentMin 10 \\ - --chimOutType WithinBAM SoftClip \\ - --chimJunctionOverhangMin 10 \\ - --chimScoreMin 1 \\ - --chimScoreDropMax 30 \\ - --chimScoreJunctionNonGTAG 0 \\ - --chimScoreSeparation 1 \\ - --alignSJstitchMismatchNmax 5 -1 5 5 \\ - --chimSegmentReadGapMax 3 \\ - --readFilesCommand zcat \\ - --sjdbOverhang ${params.read_length - 1} | - - tee Aligned.out.bam | +// +// WORKFLOW: Run main nf-core/rnafusion analysis pipeline +// +workflow NFCORE_RNAFUSION { - arriba \\ - -x /dev/stdin \\ - -a ${fasta} \\ - -g ${gtf} \\ - -b ${reference}/blacklist_hg38_GRCh38_2018-11-04.tsv \\ - -o ${sample}_arriba.tsv -O ${sample}_discarded_arriba.tsv \\ - -T -P ${extra_params} + if (params.build_references) { - mv Aligned.out.bam ${sample}_arriba.bam - """ -} - -arriba_fusions_summary = arriba_fusions_summary.dump(tag:'arriba_fusions_summary') -arriba_visualization = arriba_bam.join(arriba_tsv) - -/* - * STAR-Fusion - */ -process star_fusion { - tag "${sample}" - label 'process_high' - - publishDir "${params.outdir}/tools/Star-Fusion/${sample}", mode: 'copy' - - input: - set val(sample), file(reads) from read_files_star_fusion - file(reference) from reference.star_fusion - file(star_index) from ch_star_index - - output: - set val(sample), file("${sample}_star-fusion.tsv") optional true into star_fusion_fusions - file("*.{tsv,txt}") into star_fusion_output - - when: params.star_fusion || (params.star_fusion && params.debug) - - script: - def avail_mem = task.memory ? "--limitBAMsortRAM ${task.memory.toBytes() - 100000000}" : '' - option = params.single_end ? "--left_fq ${reads[0]}" : "--left_fq ${reads[0]} --right_fq ${reads[1]}" - def extra_params = params.star_fusion_opt ? params.star_fusion_opt : '' - """ - STAR \\ - --genomeDir ${star_index} \\ - --readFilesIn ${reads} \\ - --twopassMode Basic \\ - --outReadsUnmapped None \\ - --chimSegmentMin 12 \\ - --chimJunctionOverhangMin 12 \\ - --alignSJDBoverhangMin 10 \\ - --alignMatesGapMax 100000 \\ - --alignIntronMax 100000 \\ - --chimSegmentReadGapMax 3 \\ - --alignSJstitchMismatchNmax 5 -1 5 5 \\ - --runThreadN ${task.cpus} \\ - --outSAMstrandField intronMotif ${avail_mem} \\ - --outSAMunmapped Within \\ - --outSAMtype BAM Unsorted \\ - --outSAMattrRGline ID:GRPundef \\ - --chimMultimapScoreRange 10 \\ - --chimMultimapNmax 10 \\ - --chimNonchimScoreDropMin 10 \\ - --peOverlapNbasesMin 12 \\ - --peOverlapMMp 0.1 \\ - --readFilesCommand zcat \\ - --sjdbOverhang ${params.read_length - 1} \\ - --chimOutJunctionFormat 1 - - STAR-Fusion \\ - --genome_lib_dir ${reference} \\ - -J Chimeric.out.junction \\ - ${option} \\ - --CPU ${task.cpus} \\ - --examine_coding_effect \\ - --output_dir . ${extra_params} - - mv star-fusion.fusion_predictions.tsv ${sample}_star-fusion.tsv - mv star-fusion.fusion_predictions.abridged.tsv ${sample}_abridged.tsv - mv star-fusion.fusion_predictions.abridged.coding_effect.tsv ${sample}_abridged.coding_effect.tsv - """ -} - -star_fusion_fusions = star_fusion_fusions.dump(tag:'star_fusion_fusions') - -/* - * Fusioncatcher - */ -process fusioncatcher { - tag "${sample}" - label 'process_high' - - publishDir "${params.outdir}/tools/Fusioncatcher/${sample}", mode: 'copy' - - input: - set val(sample), file(reads) from read_files_fusioncatcher - file(data_dir) from reference.fusioncatcher - - output: - set val(sample), file("${sample}_fusioncatcher.txt") optional true into fusioncatcher_fusions - file("*.{txt,zip,log}") into fusioncatcher_output - - when: params.fusioncatcher || (params.fusioncatcher && params.debug) - - script: - option = params.single_end ? reads[0] : "${reads[0]},${reads[1]}" - def extra_params = params.fusioncatcher_opt ? params.fusioncatcher_opt : '' - """ - fusioncatcher.py \\ - -d ${data_dir} \\ - -i ${option} \\ - --threads ${task.cpus} \\ - -o . \\ - --skip-blat ${extra_params} - - mv final-list_candidate-fusion-genes.txt ${sample}_fusioncatcher.txt - """ -} - -fusioncatcher_fusions = fusioncatcher_fusions.dump(tag:'fusioncatcher_fusions') - -/* - * Ericscript - */ -process ericscript { - tag "${sample}" - label 'process_high' - - publishDir "${params.outdir}/tools/EricScript/${sample}", mode: 'copy' - - input: - set val(sample), file(reads) from read_files_ericscript - file(reference) from reference.ericscript - - output: - set val(sample), file("${sample}_ericscript.tsv") optional true into ericscript_fusions - set val(sample), file("${sample}_ericscript_total.tsv") optional true into ericscript_output - - when: params.ericscript && (!params.single_end || params.debug) - - script: - """ - ericscript.pl \\ - -db ${reference} \\ - -name fusions \\ - -p ${task.cpus} \\ - -o ./tmp \\ - ${reads} - - if [[ -f "tmp/fusions.results.filtered.tsv" ]]; then - mv tmp/fusions.results.filtered.tsv ${sample}_ericscript.tsv - fi - - if [[ -f "tmp/fusions.results.total.tsv" ]]; then - mv tmp/fusions.results.total.tsv ${sample}_ericscript_total.tsv - fi - """ -} - -ericscript_fusions = ericscript_fusions.dump(tag:'ericscript_fusions') - -/* - * Pizzly - */ -process pizzly { - tag "${sample}" - label 'process_medium' + BUILD_REFERENCES () - publishDir "${params.outdir}/tools/Pizzly/${sample}", mode: 'copy' - - input: - set val(sample), file(reads) from read_files_pizzly - file(gtf) from ch_gtf - file(transcript) from ch_transcript - - output: - set val(sample), file("${sample}_pizzly.txt") optional true into pizzly_fusions - file("*.{json,txt}") into pizzly_output - - when: params.pizzly && (!params.single_end || params.debug) - - script: - """ - kallisto index -i index.idx -k ${params.pizzly_k} ${transcript} - kallisto quant -t ${task.cpus} -i index.idx --fusion -o output ${reads[0]} ${reads[1]} - pizzly -k ${params.pizzly_k} \\ - --gtf ${gtf} \\ - --cache index.cache.txt \\ - --align-score 2 \\ - --insert-size 400 \\ - --fasta ${transcript} \\ - --output pizzly_fusions output/fusion.txt - pizzly_flatten_json.py pizzly_fusions.json pizzly_fusions.txt - - mv index.cache.txt ${sample}_pizzly_cache.txt - mv pizzly_fusions.json ${sample}_pizzly.txt - mv pizzly_fusions.txt ${sample}_pizzly.txt - mv pizzly_fusions.unfiltered.json ${sample}_unfiltered_pizzly.json - """ -} - -pizzly_fusions = pizzly_fusions.dump(tag:'pizzly_fusions') - -/* - * Squid - */ -process squid { - tag "${sample}" - label 'process_high' - - publishDir "${params.outdir}/tools/Squid/${sample}", mode: 'copy' - - input: - set val(sample), file(reads) from read_files_squid - file(star_index) from ch_star_index - file(gtf) from ch_gtf - - output: - set val(sample), file("${sample}_fusions_annotated.txt") optional true into squid_fusions - file("*.txt") into squid_output + } else { - when: params.squid && (!params.single_end || params.debug) + RNAFUSION() - script: - def avail_mem = task.memory ? "--limitBAMsortRAM ${task.memory.toBytes() - 100000000}" : '' - """ - STAR \\ - --genomeDir ${star_index} \\ - --sjdbGTFfile ${gtf} \\ - --runThreadN ${task.cpus} \\ - --readFilesIn ${reads} \\ - --twopassMode Basic \\ - --chimOutType SeparateSAMold \\ - --chimSegmentMin 20 \\ - --chimJunctionOverhangMin 12 \\ - --alignSJDBoverhangMin 10 \\ - --outReadsUnmapped Fastx \\ - --outSAMstrandField intronMotif \\ - --outSAMtype BAM SortedByCoordinate \\ - ${avail_mem} \\ - --readFilesCommand zcat - mv Aligned.sortedByCoord.out.bam ${sample}Aligned.sortedByCoord.out.bam - samtools view -bS Chimeric.out.sam > ${sample}Chimeric.out.bam - squid -b ${sample}Aligned.sortedByCoord.out.bam -c ${sample}Chimeric.out.bam -o fusions - AnnotateSQUIDOutput.py ${gtf} fusions_sv.txt fusions_annotated.txt + } - mv fusions_annotated.txt ${sample}_fusions_annotated.txt - """ } -squid_fusions = squid_fusions.dump(tag:'squid_fusions') - -read_files_summary = read_files_summary.dump(tag:'read_files_summary') - -files_and_reports_summary = read_files_summary - .join(arriba_fusions_summary, remainder: true) - .join(ericscript_fusions, remainder: true) - .join(fusioncatcher_fusions, remainder: true) - .join(pizzly_fusions, remainder: true) - .join(squid_fusions, remainder: true) - .join(star_fusion_fusions, remainder: true) - -files_and_reports_summary = files_and_reports_summary.dump(tag:'files_and_reports_summary') - /* -================================================================================ - SUMMARIZING RESULTS -================================================================================ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + RUN ALL WORKFLOWS +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -process summary { - tag "${sample}" - - publishDir "${params.outdir}/Reports/${sample}", mode: 'copy' - - input: - set val(sample), file(reads), file(arriba), file(ericscript), file(fusioncatcher), file(pizzly), file(squid), file(starfusion) from files_and_reports_summary - - output: - set val(sample), file("${sample}_fusion_list.tsv") into fusion_inspector_input_list - file("${sample}_fusion_genes_mqc.json") into summary_fusions_mq - file("*") into report - - when: !params.debug && (running_tools.size() > 0) - - - script: - def extra_params = params.fusion_report_opt ? params.fusion_report_opt : '' - def tools = !arriba.empty() ? "--arriba ${arriba} " : '' - tools += !ericscript.empty() ? "--ericscript ${ericscript} " : '' - tools += !fusioncatcher.empty() ? "--fusioncatcher ${fusioncatcher} " : '' - tools += !pizzly.empty() ? "--pizzly ${pizzly} " : '' - tools += !squid.empty() ? "--squid ${squid} " : '' - tools += !starfusion.empty() ? "--starfusion ${starfusion} " : '' - """ - fusion_report run ${sample} . ${params.databases} \\ - ${tools} ${extra_params} - mv fusion_list.tsv ${sample}_fusion_list.tsv - mv fusion_genes_mqc.json ${sample}_fusion_genes_mqc.json - """ -} - -/************************************************************* - * Visualization - ************************************************************/ - -/* - * Arriba Visualization - */ -process arriba_visualization { - tag "${sample}" - label 'process_medium' - - publishDir "${params.outdir}/tools/Arriba/${sample}", mode: 'copy' - - input: - file(reference) from reference.arriba_vis - set sample, file(bam), file(fusions) from arriba_visualization - file(gtf) from ch_gtf - - output: - file("${sample}.pdf") optional true into arriba_visualization_output - - when: params.arriba_vis && (!params.single_end || params.debug) - - script: - def suff_mem = ("${(task.memory.toBytes() - 6000000000) / task.cpus}" > 2000000000) ? 'true' : 'false' - def avail_mem = (task.memory && suff_mem) ? "-m" + "${(task.memory.toBytes() - 6000000000) / task.cpus}" : '' - """ - samtools sort -@ ${task.cpus} ${avail_mem} -O bam ${bam} > Aligned.sortedByCoord.out.bam - samtools index Aligned.sortedByCoord.out.bam - draw_fusions.R \\ - --fusions=${fusions} \\ - --alignments=Aligned.sortedByCoord.out.bam \\ - --output=${sample}.pdf \\ - --annotation=${gtf} \\ - --cytobands=${reference}/cytobands_hg38_GRCh38_2018-02-23.tsv \\ - --proteinDomains=${reference}/protein_domains_hg38_GRCh38_2019-07-05.gff3 - """ -} - - -fusion_inspector_input = fusion_inspector_input_list.join(read_files_fusion_inspector) - -fusion_inspector_input = fusion_inspector_input.dump(tag:'fusion_inspector_input') - -/* - * Fusion Inspector - */ -process fusion_inspector { - tag "${sample}" - label 'process_high' - - publishDir "${params.outdir}/tools/FusionInspector/${sample}", mode: 'copy' - - input: - set val(sample), file(fi_input_list), file(reads) from fusion_inspector_input - file(reference) from reference.fusion_inspector - - output: - file("*.{fa,gtf,bed,bam,bai,txt,html}") into fusion_inspector_output - - when: params.fusion_inspector && (!params.single_end || params.debug) - - script: - def extra_params = params.fusion_inspector_opt ? params.fusion_inspector_opt : '' - """ - FusionInspector \\ - --fusions ${fi_input_list} \\ - --genome_lib ${reference} \\ - --left_fq ${reads[0]} \\ - --right_fq ${reads[1]} \\ - --CPU ${task.cpus} \\ - -O . \\ - --out_prefix finspector \\ - --vis ${extra_params} - """ -} - -/************************************************************* - * Quality check & software verions - ************************************************************/ - -/* - * Parse software version numbers - */ -process get_software_versions { - publishDir "${params.outdir}/pipeline_info", mode: 'copy', - saveAs: { filename -> - if (filename.indexOf(".csv") > 0) filename - else null - } - - output: - file 'software_versions_mqc.yaml' into ch_software_versions_yaml - file "software_versions.csv" - - script: - """ - echo ${workflow.manifest.version} > v_pipeline.txt - echo ${workflow.nextflow.version} > v_nextflow.txt - fastqc --version > v_fastqc.txt - multiqc --version > v_multiqc.txt - cat ${baseDir}/containers/arriba/environment.yml > v_arriba.txt - cat ${baseDir}/containers/fusioncatcher/environment.yml > v_fusioncatcher.txt - cat ${baseDir}/containers/star-fusion/environment.yml > v_fusion_inspector.txt - cat ${baseDir}/containers/star-fusion/environment.yml > v_star_fusion.txt - cat ${baseDir}/containers/ericscript/environment.yml > v_ericscript.txt - cat ${baseDir}/containers/pizzly/environment.yml > v_pizzly.txt - cat ${baseDir}/containers/squid/environment.yml > v_squid.txt - cat ${baseDir}/environment.yml > v_fusion_report.txt - scrape_software_versions.py &> software_versions_mqc.yaml - """ -} - -/* - * FastQC - */ -process fastqc { - tag "$name" - label 'process_medium' - publishDir "${params.outdir}/fastqc", mode: 'copy', - saveAs: { filename -> - filename.indexOf(".zip") > 0 ? "zips/$filename" : "$filename" - } - - input: - set val(name), file(reads) from ch_read_files_fastqc - - output: - file "*_fastqc.{zip,html}" into ch_fastqc_results - - when: !params.debug - - script: - """ - fastqc --quiet --threads $task.cpus $reads - """ -} - -/* - * MultiQC - */ -process multiqc { - publishDir "${params.outdir}/MultiQC", mode: 'copy' - - input: - file (multiqc_config) from ch_multiqc_config - file (mqc_custom_config) from ch_multiqc_custom_config.collect().ifEmpty([]) - file ('fastqc/*') from ch_fastqc_results.collect().ifEmpty([]) - file ('software_versions/*') from ch_software_versions_yaml.collect() - file (fusions_mq) from summary_fusions_mq.collect().ifEmpty([]) - file workflow_summary from ch_workflow_summary.collectFile(name: "workflow_summary_mqc.yaml") - - output: - file "*multiqc_report.html" into ch_multiqc_report - file "*_data" - file "multiqc_plots" - - when: !params.debug - - script: - rtitle = custom_runName ? "--title \"$custom_runName\"" : '' - rfilename = custom_runName ? "--filename " + custom_runName.replaceAll('\\W','_').replaceAll('_+','_') + "_multiqc_report" : '' - custom_config_file = params.multiqc_config ? "--config $mqc_custom_config" : '' - """ - multiqc -f $rtitle $rfilename $custom_config_file . - """ +// +// WORKFLOW: Execute a single named workflow for the pipeline +// See: https://github.com/nf-core/rnaseq/issues/619 +// +workflow { + NFCORE_RNAFUSION () } /* - * Output Description HTML - */ -process output_documentation { - publishDir "${params.outdir}/pipeline_info", mode: 'copy' - - input: - file(output_docs) from ch_output_docs - - output: - file("results_description.html") - - when: !params.debug - - script: - """ - markdown_to_html.py $output_docs -o results_description.html - """ -} - -/* - * Completion e-mail notification - */ -workflow.onComplete { - - // Set up the e-mail variables - def subject = "[nf-core/rnafusion] Successful: $workflow.runName" - if (!workflow.success) { - subject = "[nf-core/rnafusion] FAILED: $workflow.runName" - } - def email_fields = [:] - email_fields['version'] = workflow.manifest.version - email_fields['runName'] = custom_runName ?: workflow.runName - email_fields['success'] = workflow.success - email_fields['dateComplete'] = workflow.complete - email_fields['duration'] = workflow.duration - email_fields['exitStatus'] = workflow.exitStatus - email_fields['errorMessage'] = (workflow.errorMessage ?: 'None') - email_fields['errorReport'] = (workflow.errorReport ?: 'None') - email_fields['commandLine'] = workflow.commandLine - email_fields['projectDir'] = workflow.projectDir - email_fields['summary'] = summary - email_fields['summary']['Date Started'] = workflow.start - email_fields['summary']['Date Completed'] = workflow.complete - email_fields['summary']['Pipeline script file path'] = workflow.scriptFile - email_fields['summary']['Pipeline script hash ID'] = workflow.scriptId - if (workflow.repository) email_fields['summary']['Pipeline repository Git URL'] = workflow.repository - if (workflow.commitId) email_fields['summary']['Pipeline repository Git Commit'] = workflow.commitId - if (workflow.revision) email_fields['summary']['Pipeline Git branch/tag'] = workflow.revision - email_fields['summary']['Nextflow Version'] = workflow.nextflow.version - email_fields['summary']['Nextflow Build'] = workflow.nextflow.build - email_fields['summary']['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp - - def mqc_report = null - try { - if (workflow.success) { - mqc_report = ch_multiqc_report.getVal() - if (mqc_report.getClass() == ArrayList) { - log.warn "[nf-core/rnafusion] Found multiple reports from process 'multiqc', will use only one" - mqc_report = mqc_report[0] - } - } - } catch (all) { - log.warn "[nf-core/rnafusion] Could not attach MultiQC report to summary email" - } - - // Check if we are only sending emails on failure - email_address = params.email - if (!params.email && params.email_on_fail && !workflow.success) { - email_address = params.email_on_fail - } - - // Render the TXT template - def engine = new groovy.text.GStringTemplateEngine() - def tf = new File("${baseDir}/assets/email_template.txt") - def txt_template = engine.createTemplate(tf).make(email_fields) - def email_txt = txt_template.toString() - - // Render the HTML template - def hf = new File("${baseDir}/assets/email_template.html") - def html_template = engine.createTemplate(hf).make(email_fields) - def email_html = html_template.toString() - - // Render the sendmail template - def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, baseDir: "$baseDir", mqcFile: mqc_report, mqcMaxSize: params.max_multiqc_email_size.toBytes() ] - def sf = new File("$baseDir/assets/sendmail_template.txt") - def sendmail_template = engine.createTemplate(sf).make(smail_fields) - def sendmail_html = sendmail_template.toString() - - // Send the HTML e-mail - if (email_address) { - try { - if (params.plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') } - // Try to send HTML e-mail using sendmail - [ 'sendmail', '-t' ].execute() << sendmail_html - log.info "[nf-core/rnafusion] Sent summary e-mail to $email_address (sendmail)" - } catch (all) { - // Catch failures and try with plaintext - [ 'mail', '-s', subject, email_address ].execute() << email_txt - log.info "[nf-core/rnafusion] Sent summary e-mail to $email_address (mail)" - } - } - - // Write summary e-mail HTML to a file - def output_d = new File("${params.outdir}/pipeline_info/") - if (!output_d.exists()) { - output_d.mkdirs() - } - def output_hf = new File(output_d, "pipeline_report.html") - output_hf.withWriter { w -> w << email_html } - def output_tf = new File(output_d, "pipeline_report.txt") - output_tf.withWriter { w -> w << email_txt } - - c_green = params.monochrome_logs ? '' : "\033[0;32m"; - c_purple = params.monochrome_logs ? '' : "\033[0;35m"; - c_red = params.monochrome_logs ? '' : "\033[0;31m"; - c_reset = params.monochrome_logs ? '' : "\033[0m"; - - if (workflow.stats.ignoredCount > 0 && workflow.success) { - log.info "-${c_purple}Warning, pipeline completed, but with errored process(es) ${c_reset}-" - log.info "-${c_red}Number of ignored errored process(es) : ${workflow.stats.ignoredCount} ${c_reset}-" - log.info "-${c_green}Number of successfully ran process(es) : ${workflow.stats.succeedCount} ${c_reset}-" - } - - if (workflow.success) { - log.info "-${c_purple}[nf-core/rnafusion]${c_green} Pipeline completed successfully${c_reset}-" - } else { - checkHostname() - log.info "-${c_purple}[nf-core/rnafusion]${c_red} Pipeline completed with errors${c_reset}-" - } - -} - -def nfcoreHeader() { - // Log colors ANSI codes - c_black = params.monochrome_logs ? '' : "\033[0;30m"; - c_blue = params.monochrome_logs ? '' : "\033[0;34m"; - c_cyan = params.monochrome_logs ? '' : "\033[0;36m"; - c_dim = params.monochrome_logs ? '' : "\033[2m"; - c_green = params.monochrome_logs ? '' : "\033[0;32m"; - c_purple = params.monochrome_logs ? '' : "\033[0;35m"; - c_reset = params.monochrome_logs ? '' : "\033[0m"; - c_white = params.monochrome_logs ? '' : "\033[0;37m"; - c_yellow = params.monochrome_logs ? '' : "\033[0;33m"; - - return """ -${c_dim}--------------------------------------------------${c_reset}- - ${c_green},--.${c_black}/${c_green},-.${c_reset} - ${c_blue} ___ __ __ __ ___ ${c_green}/,-._.--~\'${c_reset} - ${c_blue} |\\ | |__ __ / ` / \\ |__) |__ ${c_yellow}} {${c_reset} - ${c_blue} | \\| | \\__, \\__/ | \\ |___ ${c_green}\\`-._,-`-,${c_reset} - ${c_green}`._,._,\'${c_reset} - ${c_purple} nf-core/rnafusion v${workflow.manifest.version}${c_reset} - -${c_dim}--------------------------------------------------${c_reset}- - """.stripIndent() -} - -def checkHostname() { - def c_reset = params.monochrome_logs ? '' : "\033[0m" - def c_white = params.monochrome_logs ? '' : "\033[0;37m" - def c_red = params.monochrome_logs ? '' : "\033[1;91m" - def c_yellow_bold = params.monochrome_logs ? '' : "\033[1;93m" - if (params.hostnames) { - def hostname = "hostname".execute().text.trim() - params.hostnames.each { prof, hnames -> - hnames.each { hname -> - if (hostname.contains(hname) && !workflow.profile.contains(prof)) { - log.error "====================================================\n" + - " ${c_red}WARNING!${c_reset} You are running with `-profile $workflow.profile`\n" + - " but your machine hostname is ${c_white}'$hostname'${c_reset}\n" + - " ${c_yellow_bold}It's highly recommended that you use `-profile $prof${c_reset}`\n" + - "============================================================" - } - } - } - } -} +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + THE END +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ diff --git a/modules.json b/modules.json new file mode 100644 index 00000000..d46c6cfd --- /dev/null +++ b/modules.json @@ -0,0 +1,53 @@ +{ + "name": "nf-core/rnafusion", + "homePage": "https://github.com/nf-core/rnafusion", + "repos": { + "nf-core/modules": { + "arriba": { + "git_sha": "bf91f9dd9b62c98175d962bd51dbe2d0b7f91d4c" + }, + "cat/fastq": { + "git_sha": "9aadd9a6d3f5964476582319b3a1c54a3e3fe7c9" + }, + "custom/dumpsoftwareversions": { + "git_sha": "e745e167c1020928ef20ea1397b6b4d230681b4d" + }, + "fastqc": { + "git_sha": "49b18b1639f4f7104187058866a8fab33332bdfe" + }, + "kallisto/index": { + "git_sha": "e745e167c1020928ef20ea1397b6b4d230681b4d" + }, + "multiqc": { + "git_sha": "49b18b1639f4f7104187058866a8fab33332bdfe" + }, + "picard/collectwgsmetrics": { + "git_sha": "e263f267a7b3c2cbf95372c832e38ba8b76dfd2e" + }, + "picard/markduplicates": { + "git_sha": "20ebb89ff97a2665106be9cace5ccb9aa4eed1be" + }, + "qualimap/rnaseq": { + "git_sha": "e20e57f90b6787ac9a010a980cf6ea98bd990046" + }, + "samtools/faidx": { + "git_sha": "897c33d5da084b61109500ee44c01da2d3e4e773" + }, + "samtools/index": { + "git_sha": "897c33d5da084b61109500ee44c01da2d3e4e773" + }, + "samtools/sort": { + "git_sha": "897c33d5da084b61109500ee44c01da2d3e4e773" + }, + "samtools/view": { + "git_sha": "6b64f9cb6c3dd3577931cc3cd032d6fb730000ce" + }, + "star/align": { + "git_sha": "1dddf1ce9443e3d93853d86e7a7aab52e5b4d614" + }, + "star/genomegenerate": { + "git_sha": "897c33d5da084b61109500ee44c01da2d3e4e773" + } + } + } +} diff --git a/modules/local/arriba/download/main.nf b/modules/local/arriba/download/main.nf new file mode 100644 index 00000000..08cdc4fe --- /dev/null +++ b/modules/local/arriba/download/main.nf @@ -0,0 +1,37 @@ +process ARRIBA_DOWNLOAD { + tag "arriba" + label 'process_low' + + conda (params.enable_conda ? "bioconda::gnu-wget=1.18" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/gnu-wget:1.18--h5bf99c6_5" + } else { + container "quay.io/biocontainers/gnu-wget:1.18--h5bf99c6_5" + } + + output: + path "versions.yml" , emit: versions + path "*" , emit: reference + + script: + """ + wget https://github.com/suhrig/arriba/releases/download/v2.1.0/arriba_v2.1.0.tar.gz -O arriba_v2.1.0.tar.gz + tar -xzvf arriba_v2.1.0.tar.gz + rm arriba_v2.1.0.tar.gz + mv arriba_v2.1.0/database/* . + rm -r arriba_v2.1.0 + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + fusion_report: \$(echo \$(wget -V 2>&1) | grep "GNU Wget" | cut -d" " -f3) + END_VERSIONS + """ + + stub: + """ + mkdir -p arriba_v2.1.0/database/ + touch arriba_v2.1.0/database/arriba.test + + touch versions.yml + """ +} diff --git a/modules/local/arriba/download/meta.yml b/modules/local/arriba/download/meta.yml new file mode 100644 index 00000000..55e50b11 --- /dev/null +++ b/modules/local/arriba/download/meta.yml @@ -0,0 +1,26 @@ +name: arriba_download +description: Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data. +keywords: + - fusion + - arriba +tools: + - arriba: + description: Fast and accurate gene fusion detection from RNA-Seq data + homepage: https://github.com/suhrig/arriba + documentation: https://arriba.readthedocs.io/en/latest/ + tool_dev_url: https://github.com/suhrig/arriba + doi: "10.1101/gr.257246.119" + licence: ["MIT"] + +output: + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - reference: + type: directory + description: Folder with arriba references + pattern: "*" + +authors: + - "@praveenraj2018, @rannick" diff --git a/modules/local/arriba/visualisation/main.nf b/modules/local/arriba/visualisation/main.nf new file mode 100644 index 00000000..52caf995 --- /dev/null +++ b/modules/local/arriba/visualisation/main.nf @@ -0,0 +1,41 @@ +process ARRIBA_VISUALISATION { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::arriba=2.1.0" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/arriba:2.1.0--h3198e80_1' : + 'quay.io/biocontainers/arriba:2.1.0--h3198e80_1' }" + + input: + tuple val(meta), path(bam), path(bai) + tuple val(meta), path(fusions) + path reference + path gtf + + output: + tuple val(meta), path("*.pdf") , emit: pdf + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def args2 = task.ext.args2 ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + draw_fusions.R \\ + --fusions=$fusions \\ + --alignments=$bam \\ + --output=${prefix}.pdf \\ + --annotation=${gtf} \\ + --cytobands=${reference}/${args} \\ + --proteinDomains=${reference}/${args2} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + arriba: \$(arriba -h | grep 'Version:' 2>&1 | sed 's/Version:\s//') + END_VERSIONS + """ +} diff --git a/modules/local/arriba/visualisation/meta.yml b/modules/local/arriba/visualisation/meta.yml new file mode 100644 index 00000000..a7418ca2 --- /dev/null +++ b/modules/local/arriba/visualisation/meta.yml @@ -0,0 +1,54 @@ +name: arriba_visualisation +description: Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data. +keywords: + - visualisation + - arriba +tools: + - arriba: + description: Fast and accurate gene fusion detection from RNA-Seq data + homepage: https://github.com/suhrig/arriba + documentation: https://arriba.readthedocs.io/en/latest/ + tool_dev_url: https://github.com/suhrig/arriba + doi: "10.1101/gr.257246.119" + licence: ["MIT"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: BAM/CRAM/SAM file + pattern: "*.{bam,cram,sam}" + - bai: + type: file + description: BAMindex file + pattern: "*.{bai}" + - fusions: + type: file + description: Arriba fusions file + pattern: "*.{tsv}" + - gtf: + type: file + description: Annotation GTF file + pattern: "*.{gtf}" + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - pdf: + type: file + description: File contains fusions visualisation + pattern: "*.{pdf}" + +authors: + - "@rannick" diff --git a/modules/local/ensembl/main.nf b/modules/local/ensembl/main.nf new file mode 100644 index 00000000..1b689f40 --- /dev/null +++ b/modules/local/ensembl/main.nf @@ -0,0 +1,51 @@ +process ENSEMBL_DOWNLOAD { + tag "ensembl" + label 'process_low' + + conda (params.enable_conda ? "bioconda::gnu-wget=1.18" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/gnu-wget:1.18--h5bf99c6_5" + } else { + container "quay.io/biocontainers/gnu-wget:1.18--h5bf99c6_5" + } + + input: + val ensembl_version + + output: + path "versions.yml" , emit: versions + path "Homo_sapiens.${params.genome}.${ensembl_version}.all.fa" , emit: fasta + path "Homo_sapiens.${params.genome}.${ensembl_version}.gtf" , emit: gtf + path "Homo_sapiens.${params.genome}.${ensembl_version}.chr.gtf" , emit: chrgtf + path "Homo_sapiens.${params.genome}.${ensembl_version}.cdna.all.fa.gz", emit: transcript + + script: + """ + wget ftp://ftp.ensembl.org/pub/release-${ensembl_version}/fasta/homo_sapiens/dna/Homo_sapiens.${params.genome}.dna.chromosome.{1..22}.fa.gz + wget ftp://ftp.ensembl.org/pub/release-${ensembl_version}/fasta/homo_sapiens/dna/Homo_sapiens.${params.genome}.dna.chromosome.{MT,X,Y}.fa.gz + + wget ftp://ftp.ensembl.org/pub/release-${ensembl_version}/gtf/homo_sapiens/Homo_sapiens.${params.genome}.${ensembl_version}.gtf.gz + wget ftp://ftp.ensembl.org/pub/release-${ensembl_version}/gtf/homo_sapiens/Homo_sapiens.${params.genome}.${ensembl_version}.chr.gtf.gz + wget ftp://ftp.ensembl.org/pub/release-${ensembl_version}/fasta/homo_sapiens/cdna/Homo_sapiens.${params.genome}.cdna.all.fa.gz -O Homo_sapiens.${params.genome}.${ensembl_version}.cdna.all.fa.gz + + gunzip -c Homo_sapiens.${params.genome}.dna.chromosome.* > Homo_sapiens.${params.genome}.${ensembl_version}.all.fa + gunzip Homo_sapiens.${params.genome}.${ensembl_version}.gtf.gz + gunzip Homo_sapiens.${params.genome}.${ensembl_version}.chr.gtf.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + wget: \$(echo wget -V 2>&1 | grep "GNU Wget" | cut -d" " -f3 > versions.yml) + END_VERSIONS + """ + + stub: + """ + touch "Homo_sapiens.${params.genome}.${ensembl_version}.all.fa" + touch "Homo_sapiens.${params.genome}.${ensembl_version}.gtf" + touch "Homo_sapiens.${params.genome}.${ensembl_version}.chr.gtf" + touch "Homo_sapiens.${params.genome}.${ensembl_version}.cdna.all.fa.gz" + + touch versions.yml + """ + +} diff --git a/modules/local/ericscript/detect/main.nf b/modules/local/ericscript/detect/main.nf new file mode 100644 index 00000000..baeb5a62 --- /dev/null +++ b/modules/local/ericscript/detect/main.nf @@ -0,0 +1,44 @@ +process ERICSCRIPT { + tag "eriscript" + label 'process_low' + + conda (params.enable_conda ? "bioconda::ericscript=0.5.5 conda-forge::ncurses=6.1" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "docker.io/nfcore/rnafusion:ericscript_0.5.5" + } else { + container "docker.io/nfcore/rnafusion:ericscript_0.5.5" + } + + input: + tuple val(meta), path(reads) + path reference + + output: + tuple val(meta), path("*.results.filtered.tsv"), emit: fusions + tuple val(meta), path("*.results.total.tsv") , emit: fusions_total + path "versions.yml" , emit: versions + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + + ericscript.pl \\ + -db $reference \\ + -name ${prefix} \\ + -p $task.cpus \\ + -o . \\ + $reads \\ + $args + + echo \$(wget -V 2>&1) | grep "GNU Wget" | cut -d" " -f3 > versions.yml + + """ + + stub: + """ + touch ${prefix}.results.filtered.tsv + touch ${prefix}.results.total.tsv + touch versions.yml + """ +} diff --git a/modules/local/ericscript/download/main.nf b/modules/local/ericscript/download/main.nf new file mode 100644 index 00000000..95510c15 --- /dev/null +++ b/modules/local/ericscript/download/main.nf @@ -0,0 +1,24 @@ +process ERICSCRIPT_DOWNLOAD { + tag "eriscript" + label 'process_low' + + conda (params.enable_conda ? "bioconda::gnu-wget=1.18" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/gnu-wget:1.18--h5bf99c6_5" + } else { + container "quay.io/biocontainers/gnu-wget:1.18--h5bf99c6_5" + } + + output: + path "versions.yml" , emit: versions + path "*" , emit: reference + + script: + """ + wget http://ngi-igenomes.s3.amazonaws.com/igenomes/Homo_sapiens/Ensembl/GRCh38/Sequence/ericscript_db_homosapiens_ensembl84.tar.bz2 + tar jxf ericscript_db_homosapiens_ensembl84.tar.bz2 --strip-components=2 + rm ericscript_db_homosapiens_ensembl84.tar.bz2 + + echo \$(wget -V 2>&1) | grep "GNU Wget" | cut -d" " -f3 > versions.yml + """ +} diff --git a/modules/local/fusioncatcher/detect/main.nf b/modules/local/fusioncatcher/detect/main.nf new file mode 100644 index 00000000..08c3e788 --- /dev/null +++ b/modules/local/fusioncatcher/detect/main.nf @@ -0,0 +1,47 @@ +process FUSIONCATCHER { + tag "$meta.id" + label 'process_high' + + conda (params.enable_conda ? "bioconda::fusioncatcher=1.33" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "docker.io/clinicalgenomics/fusioncatcher:1.33" + } else { + container "docker.io/clinicalgenomics/fusioncatcher:1.33" + } + + input: + tuple val(meta), path(fasta) + path reference + + output: + tuple val(meta), path("*.fusioncatcher.fusion-genes.txt") , optional:true , emit: fusions + tuple val(meta), path("*.fusioncatcher.summary.txt") , optional:true , emit: summary + tuple val(meta), path("*.fusioncatcher.log") , emit: log + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def reads = fasta.toString().replace(" ", ",") + """ + fusioncatcher.py \\ + -d $reference \\ + -i $reads \\ + -p $task.cpus \\ + -o . \\ + --skip-blat \\ + $args + + mv final-list_candidate-fusion-genes.txt ${prefix}.fusioncatcher.fusion-genes.txt + mv summary_candidate_fusions.txt ${prefix}.fusioncatcher.summary.txt + mv fusioncatcher.log ${prefix}.fusioncatcher.log + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + fusioncatcher: \$(echo \$(fusioncatcher --version 2>&1)| sed 's/fusioncatcher.py //') + END_VERSIONS + """ +} diff --git a/modules/local/fusioncatcher/detect/meta.yml b/modules/local/fusioncatcher/detect/meta.yml new file mode 100644 index 00000000..7c8ee425 --- /dev/null +++ b/modules/local/fusioncatcher/detect/meta.yml @@ -0,0 +1,53 @@ +name: fusioncatcher +description: FusionCatcher searches for novel/known somatic fusion genes, translocations, and chimeras in RNA-seq data +keywords: + - fusioncatcher +tools: + - fusioncatcher: + description: FusionCatcher searches for novel/known somatic fusion genes, translocations, and chimeras in RNA-seq data + homepage: https://github.com/ndaniel/fusioncatcher + documentation: https://github.com/ndaniel/fusioncatcher/wiki + tool_dev_url: https://github.com/ndaniel/fusioncatcher + doi: "10.1101/011650v1" + licence: ["GPL v3"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: FASTQ file + pattern: "*.{fastq}" + - reference: + type: directory + description: Path to fusioncatcher references + pattern: "*" + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - fusions: + type: file + description: Final list of candidate fusion genes + pattern: "*.fusioncatcher.fusion-genes.txt" + - summary: + type: file + description: Summary of fusion results + pattern: "*.fusioncatcher_summary.txt" + - log: + type: file + description: Log of fusion results + pattern: "*.fusioncatcher.log" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + +authors: + - "@praveenraj2018. @rannick" diff --git a/modules/local/fusioncatcher/download/main.nf b/modules/local/fusioncatcher/download/main.nf new file mode 100644 index 00000000..119f2523 --- /dev/null +++ b/modules/local/fusioncatcher/download/main.nf @@ -0,0 +1,43 @@ +process FUSIONCATCHER_DOWNLOAD { + tag "fusioncatcher_download" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::fusioncatcher=1.33" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + + container "docker.io/clinicalgenomics/fusioncatcher:1.33" + } else { + container "docker.io/clinicalgenomics/fusioncatcher:1.33" + } + + output: + path "*" , emit: reference + path "versions.yml" , emit: versions + + script: + + def args = task.ext.args ?: '' + def args2 = task.ext.args2 ?: '' + def human_version = "v102" + def url = "http://sourceforge.net/projects/fusioncatcher/files/data/human_${human_version}.tar.gz.aa" + """ + if wget --spider "$url" 2>/dev/null; then + wget $args $url + wget $args http://sourceforge.net/projects/fusioncatcher/files/data/human_${human_version}.tar.gz.ab + wget $args http://sourceforge.net/projects/fusioncatcher/files/data/human_${human_version}.tar.gz.ac + wget $args http://sourceforge.net/projects/fusioncatcher/files/data/human_${human_version}.tar.gz.ad + cat human_${human_version}.tar.gz.* | tar xz + rm human_${human_version}.tar* + else + fusioncatcher-build \\ + -g homo_sapiens \\ + -o human_${human_version} \\ + $args2 + fi + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + fusioncatcher: \$(echo \$(fusioncatcher --version 2>&1)) + END_VERSIONS + """ +} diff --git a/modules/local/fusioncatcher/download/meta.yml b/modules/local/fusioncatcher/download/meta.yml new file mode 100644 index 00000000..40421a4e --- /dev/null +++ b/modules/local/fusioncatcher/download/meta.yml @@ -0,0 +1,25 @@ +name: fusioncatcher_download +description: Build genome for fusioncatcher +keywords: + - sort +tools: + - fusioncatcher: + description: Build genome for fusioncatcher + homepage: https://github.com/ndaniel/fusioncatcher/ + documentation: https://github.com/ndaniel/fusioncatcher/blob/master/doc/manual.md + tool_dev_url: https://github.com/ndaniel/fusioncatcher/ + doi: "10.1101/011650" + licence: ["GPL v3"] + +output: + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - reference: + type: directory + description: Path to fusioncatcher references + pattern: "*" + +authors: + - "@praveenraj2018, @rannick" diff --git a/modules/local/fusioninspector/main.nf b/modules/local/fusioninspector/main.nf new file mode 100644 index 00000000..c5fbe360 --- /dev/null +++ b/modules/local/fusioninspector/main.nf @@ -0,0 +1,49 @@ +process FUSIONINSPECTOR { + tag "$meta.id" + label 'process_high' + + conda (params.enable_conda ? "bioconda::dfam=3.3 bioconda::hmmer=3.3.2 bioconda::star-fusion=1.10.0 bioconda::trinity=date.2011_11_2 bioconda::samtools=1.9 bioconda::star=2.7.8a" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "docker.io/trinityctat/starfusion:1.10.1" + } else { + container "docker.io/trinityctat/starfusion:1.10.1" + } + + input: + tuple val(meta), path(reads) + tuple val(meta), path(fusion_list) + path reference + + output: + path "*" , emit: output + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def prefix = task.ext.prefix ?: "${meta.id}" + def fasta = meta.single_end ? "--left_fq ${reads[0]}" : "--left_fq ${reads[0]} --right_fq ${reads[1]}" + def args = task.ext.args ?: '' + """ + FusionInspector \\ + --fusions $fusion_list \\ + --genome_lib ${reference} \\ + $fasta \\ + --CPU ${task.cpus} \\ + -O . \\ + --out_prefix $prefix \\ + --vis $args + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + STAR-Fusion: \$(echo STAR-Fusion --version 2>&1 | grep -i 'version' | sed 's/STAR-Fusion version: //') + END_VERSIONS + """ + + stub: + """ + touch versions.yml + touch FusionInspector.log + """ +} diff --git a/modules/local/fusioninspector/meta.yml b/modules/local/fusioninspector/meta.yml new file mode 100644 index 00000000..cc03239b --- /dev/null +++ b/modules/local/fusioninspector/meta.yml @@ -0,0 +1,40 @@ +name: fusioninspector +description: Validation of Fusion Transcript Predictions +keywords: + - fusioninspector +tools: + - fusioninspector: + description: Validation of Fusion Transcript Predictions + homepage: https://github.com/FusionInspector/FusionInspector + documentation: https://github.com/FusionInspector/FusionInspector/wiki + tool_dev_url: https://github.com/FusionInspector/FusionInspector + doi: 10.1101/2021.08.02.454639" + licence: https://github.com/FusionInspector/FusionInspector/blob/master/LICENSE.txt + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: FASTQ file + pattern: "*.{fastq*}" + - reference: + type: directory + description: Path to ctat references + pattern: "*" + +output: + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - reference: + type: directory + description: Genome resource path + pattern: "*" + +authors: + - "@rannick" diff --git a/modules/local/fusionreport/detect/main.nf b/modules/local/fusionreport/detect/main.nf new file mode 100644 index 00000000..ddf4667b --- /dev/null +++ b/modules/local/fusionreport/detect/main.nf @@ -0,0 +1,43 @@ +process FUSIONREPORT { + tag "$meta.id" + label 'process_medium' + + // Note: 2.7X indices incompatible with AWS iGenomes. + conda (params.enable_conda ? 'bioconda::star=2.7.9a' : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'docker.io/rannickscilifelab/fusion-report:2.1.5updated' : + 'docker.io/rannickscilifelab/fusion-report:2.1.5updated' }" + + + input: + tuple val(meta), path(reads), path(arriba_fusions), path(pizzly_fusions), path(squid_fusions), path(starfusion_fusions), path(fusioncatcher_fusions) + path(fusionreport_ref) + + output: + path "versions.yml" , emit: versions + tuple val(meta), path("*fusionreport.tsv") , emit: fusion_list + tuple val(meta), path("*fusionreport_filtered.tsv") , emit: fusion_list_filtered + tuple val(meta), path("*.html") , emit: report + + when: + task.ext.when == null || task.ext.when + + script: + def tools = params.arriba || params.all ? "--arriba ${arriba_fusions} " : '' + tools += params.pizzly || params.all ? "--pizzly ${pizzly_fusions} " : '' + tools += params.squid || params.all ? "--squid ${squid_fusions} " : '' + tools += params.starfusion || params.all ? "--starfusion ${starfusion_fusions} " : '' + tools += params.fusioncatcher || params.all ? "--fusioncatcher ${fusioncatcher_fusions} " : '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + fusion_report run $meta.id . $fusionreport_ref $tools --allow-multiple-gene-symbols + + mv fusion_list.tsv ${prefix}.fusionreport.tsv + mv fusion_list_filtered.tsv ${prefix}.fusionreport_filtered.tsv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + fusion_report: \$(fusion_report --version) + END_VERSIONS + """ +} diff --git a/modules/local/fusionreport/detect/meta.yml b/modules/local/fusionreport/detect/meta.yml new file mode 100644 index 00000000..f3d5cd88 --- /dev/null +++ b/modules/local/fusionreport/detect/meta.yml @@ -0,0 +1,59 @@ +name: fusionreport +description: fusionreport +keywords: + - sort +tools: + - fusionreport: + description: fusionreport + homepage: https://github.com/matq007/fusion-report + documentation: https://matq007.github.io/fusion-report/#/ + doi: "10.1101/011650" + licence: ["GPL v3"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reference: + type: path + description: Path to fusionreport references + pattern: "*" + - arriba_fusions: + type: path + description: File + pattern: "*.fusions.tsv" + - pizzly_fusions: + type: path + description: File containing fusions from pizzly + pattern: "*.pizzly.txt" + - squid_fusions: + type: path + description: File containing fusions from squid + pattern: "*.annotated.txt" + - starfusion_fusions: + type: path + description: File containing fusions from STARfusion + pattern: "*.starfusion.fusion_predictions.tsv" + - fusioncatcher_fusions: + type: path + description: File containing fusions from fusioncatcher + pattern: "*.fusions.tsv" + +output: + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - fusion_list: + type: file + description: File containing the summary of all fusions fed-in + pattern: "*.tsv" + - report: + type: file + description: HTML files + pattern: "*.html" + +authors: + - "@praveenraj2018, @rannick" diff --git a/modules/local/fusionreport/download/main.nf b/modules/local/fusionreport/download/main.nf new file mode 100644 index 00000000..746c3465 --- /dev/null +++ b/modules/local/fusionreport/download/main.nf @@ -0,0 +1,29 @@ +process FUSIONREPORT_DOWNLOAD { + tag 'fusionreport' + label 'process_medium' + + // Note: 2.7X indices incompatible with AWS iGenomes. + conda (params.enable_conda ? 'bioconda::star=2.7.9a' : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'docker.io/rannickscilifelab/fusion-report:2.1.5updated' : + 'docker.io/rannickscilifelab/fusion-report:2.1.5updated' }" + + + input: + val(username) + val(passwd) + + output: + path "*" , emit: reference + path "versions.yml" , emit: versions + + script: + """ + fusion_report download --cosmic_usr $username --cosmic_passwd $passwd . + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + fusion_report: \$(fusion_report --version) + END_VERSIONS + """ +} diff --git a/modules/local/fusionreport/download/meta.yml b/modules/local/fusionreport/download/meta.yml new file mode 100644 index 00000000..21a15a89 --- /dev/null +++ b/modules/local/fusionreport/download/meta.yml @@ -0,0 +1,35 @@ +name: fusionreport_download +description: Build DB for fusionreport +keywords: + - sort +tools: + - fusioncatcher: + description: Build DB for fusionreport + homepage: https://github.com/ndaniel/fusioncatcher/ + documentation: https://github.com/ndaniel/fusioncatcher/blob/master/doc/manual.md + tool_dev_url: https://github.com/ndaniel/fusioncatcher/ + doi: "10.1101/011650" + licence: ["GPL v3"] + +input: + - username: + type: value + description: Organism for which the data is downloaded from Ensembl database and built + pattern: "*" + - passwd: + type: value + description: Organism for which the data is downloaded from Ensembl database and built + pattern: "*" + +output: + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - reference: + type: directory + description: directory containing the genome resource files required for fusioncatcher + pattern: "fusioncatcher-genome" + +authors: + - "@praveenraj2018" diff --git a/modules/local/getmeta/main.nf b/modules/local/getmeta/main.nf new file mode 100644 index 00000000..c52b16dc --- /dev/null +++ b/modules/local/getmeta/main.nf @@ -0,0 +1,20 @@ +process GET_META { + tag 'getmeta' + label 'process_low' + + conda (params.enable_conda ? "conda-forge::sed=4.7" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img' : + 'biocontainers/biocontainers:v1.2.0_cv1' }" + + input: + tuple val(meta), path(reads) + path file + + output: + tuple val(meta), path(file) , emit: file + + script: + """ + """ +} diff --git a/modules/local/getpath/main.nf b/modules/local/getpath/main.nf new file mode 100644 index 00000000..5f159287 --- /dev/null +++ b/modules/local/getpath/main.nf @@ -0,0 +1,19 @@ +process GET_PATH { + tag 'getpath' + label 'process_low' + + conda (params.enable_conda ? "conda-forge::sed=4.7" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img' : + 'biocontainers/biocontainers:v1.2.0_cv1' }" + + input: + tuple val(meta), path(file) + + output: + path file , emit: file + + script: + """ + """ +} diff --git a/modules/local/kallisto/quant/main.nf b/modules/local/kallisto/quant/main.nf new file mode 100644 index 00000000..bc0f3511 --- /dev/null +++ b/modules/local/kallisto/quant/main.nf @@ -0,0 +1,37 @@ +process KALLISTO_QUANT { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::kallisto=0.46.2" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/kallisto:0.46.2--h4f7b962_1' : + 'quay.io/biocontainers/kallisto:0.46.2--h4f7b962_1' }" + + + input: + tuple val(meta), path(reads) + path index + + output: + path "versions.yml" , emit: versions + tuple val(meta), path("*fusions.txt") , emit: txt + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + kallisto quant \ + -t $task.cpus \ + -i $index \ + --fusion \ + -o . \ + $reads + mv fusion.txt ${prefix}.kallisto_quant.fusions.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + kallisto: \$(echo \$(kallisto 2>&1) | sed 's/^kallisto //; s/Usage.*\$//') + END_VERSIONS + """ +} + diff --git a/modules/local/kallisto/quant/meta.yml b/modules/local/kallisto/quant/meta.yml new file mode 100644 index 00000000..31821aa6 --- /dev/null +++ b/modules/local/kallisto/quant/meta.yml @@ -0,0 +1,44 @@ +name: kallisto_quant +description: runs the kallisto quantification algorithm + - quant +tools: + - kallisto: + description: Quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. + homepage: https://pachterlab.github.io/kallisto/ + documentation: https://pachterlab.github.io/kallisto/manual + tool_dev_url: https://github.com/pachterlab/kallisto + doi: "" + licence: ["BSD-2-Clause"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: FASTQ file + pattern: "*.{fastq}" + - reference: + type: directory + description: Path to kallisto index + pattern: "*" + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - fusions: + type: file + description: fusions + pattern: "*.txt" + +authors: + - "@rannick" diff --git a/modules/local/picard/collectrnaseqmetrics/main.nf b/modules/local/picard/collectrnaseqmetrics/main.nf new file mode 100644 index 00000000..d01c6534 --- /dev/null +++ b/modules/local/picard/collectrnaseqmetrics/main.nf @@ -0,0 +1,58 @@ +process PICARD_COLLECTRNASEQMETRICS { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::picard=2.26.10" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/picard:2.26.10--hdfd78af_0' : + 'quay.io/biocontainers/picard:2.26.10--hdfd78af_0' }" + + input: + tuple val(meta), path(bam), path(bai) + path(refflat) + path(rrna_intervals) + + output: + tuple val(meta), path("*rna_metrics.txt") , emit: metrics + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def strandedness = '' + // def strandedness = '--STRAND_SPECIFICITY FIRST_READ_TRANSCRIPTION_STRAND' + if ("${meta.strandedness}" == 'forward') { + strandedness = '--STRAND_SPECIFICITY FIRST_READ_TRANSCRIPTION_STRAND' + } else if ("${meta.strandedness}" == 'reverse') { + strandedness = '--STRAND_SPECIFICITY SECOND_READ_TRANSCRIPTION_STRAND' + } else { + strandedness = '--STRAND_SPECIFICITY NONE' + } + + def rrna = rrna_intervals == [] ? '' : "--RIBOSOMAL_INTERVALS ${rrna_intervals}" + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def avail_mem = 3 + if (!task.memory) { + log.info '[Picard CollectRnaMetrics] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this.' + } else { + avail_mem = task.memory.giga + } + """ + picard \\ + -Xmx${avail_mem}g \\ + CollectRnaSeqMetrics \\ + --TMP_DIR ./tmp \\ + ${strandedness} \\ + ${rrna} \\ + --REF_FLAT ${refflat} \\ + --INPUT ${bam} \\ + --OUTPUT ${prefix}_rna_metrics.txt \\ + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + picard: \$(picard CollectRnaMetrics --version 2>&1 | grep -o 'Version.*' | cut -f2- -d:) + END_VERSIONS + """ +} diff --git a/modules/local/picard/collectrnaseqmetrics/meta.yml b/modules/local/picard/collectrnaseqmetrics/meta.yml new file mode 100644 index 00000000..131b8337 --- /dev/null +++ b/modules/local/picard/collectrnaseqmetrics/meta.yml @@ -0,0 +1,53 @@ +name: picard_collectrnaseqmetrics +description: Produces RNA alignment metrics for a SAM or BAM file. +keywords: + - alignment + - metrics + - statistics + - quality + - bam + - RNA +tools: + - picard: + description: | + A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) + data and formats such as SAM/BAM/CRAM and VCF. + homepage: https://broadinstitute.github.io/picard/ + documentation: https://broadinstitute.github.io/picard/ + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: BAM file + pattern: "*.{bam}" + - bai: + type: file + description: An optional BAM index file. If desired, --CREATE_INDEX must be passed as a flag + pattern: "*.{bai}" + - refflat: + type: file + description: Gene annotations in refFlat form + - rrna_interval: + type: file + description: Location of rRNA sequences in genome, in interval_list format +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - metrics: + type: file + description: Alignment metrics files generated by picard + pattern: "*_{metrics}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@rannick" diff --git a/modules/local/pizzly/detect/main.nf b/modules/local/pizzly/detect/main.nf new file mode 100644 index 00000000..6222d3ba --- /dev/null +++ b/modules/local/pizzly/detect/main.nf @@ -0,0 +1,38 @@ +process PIZZLY { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::kallisto=0.46.2 bioconda::pizzly==0.37.3" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/pizzly:0.37.3--py36_2' : + 'quay.io/biocontainers/pizzly:0.37.3--h470a237_3' }" + + input: + tuple val(meta), path(txt) + path transcript + path gtf + + output: + path "versions.yml" , emit: versions + tuple val(meta), path("*pizzly.txt") , emit: fusions + tuple val(meta), path("*unfiltered.json") , emit: fusions_unfiltered + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + pizzly \\ + $args \\ + --gtf $gtf \\ + --fasta $transcript \\ + --output ${prefix}.pizzly $txt + + pizzly_flatten_json.py ${prefix}.pizzly.json ${prefix}.pizzly.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + pizzly: \$(pizzly --version | grep pizzly | sed -e "s/pizzly version: //g") + END_VERSIONS + """ +} + diff --git a/modules/local/pizzly/detect/meta.yml b/modules/local/pizzly/detect/meta.yml new file mode 100644 index 00000000..930b3a62 --- /dev/null +++ b/modules/local/pizzly/detect/meta.yml @@ -0,0 +1,44 @@ +name: pizzly +description: Pizzly detection of fusions. +keywords: + - fusion + - pizzly +tools: + - pizzly: + description: Fast fusion detection using kallisto + homepage: https://github.com/pmelsted/pizzly + documentation: https://github.com/pmelsted/pizzly + tool_dev_url: https://github.com/pmelsted/pizzly + doi: "" + licence: ["BSD-2-Clause"] + +input: + - fasta: + type: file + description: genome fasta file + pattern: "*.{fasta*}" + - reference: + type: directory + description: Path to kallisto index + pattern: "*" + - gtf: + type: file + description: gtf reference + pattern: "*.gtf" + +output: + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - fusions: + type: file + description: fusions + pattern: "*pizzly.txt" + - unfiltered: + type: file + description: unfiltered fusions + pattern: "*unfiltered.json" + +authors: + - "@rannick" diff --git a/modules/local/pizzly/download/main.nf b/modules/local/pizzly/download/main.nf new file mode 100644 index 00000000..d3be739b --- /dev/null +++ b/modules/local/pizzly/download/main.nf @@ -0,0 +1,29 @@ +process PIZZLY_DOWNLOAD { + tag "pizzly" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::kallisto=0.46.2" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/kallisto:0.46.2--h4f7b962_1" + } else { + container "quay.io/biocontainers/kallisto:0.46.2--h4f7b962_1" + } + + input: + path transcript + + output: + path "versions.yml" , emit: versions + path "index.idx" , emit: reference + + script: + def args = task.ext.args ?: '' + """ + kallisto index \\ + -i index.idx \\ + $args \\ + $transcript + + echo \$(kallisto 2>&1) | sed 's/^kallisto //; s/Usage.*\$//' > versions.yml + """ +} diff --git a/modules/local/samplesheet_check.nf b/modules/local/samplesheet_check.nf new file mode 100644 index 00000000..fe14e2ee --- /dev/null +++ b/modules/local/samplesheet_check.nf @@ -0,0 +1,29 @@ +process SAMPLESHEET_CHECK { + tag "$samplesheet" + + conda (params.enable_conda ? "conda-forge::python=3.8.3" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/python:3.8.3" + } else { + container "quay.io/biocontainers/python:3.8.3" + } + + input: + path samplesheet + + output: + path '*.csv' , emit: csv + path "versions.yml", emit: versions + + script: // This script is bundled with the pipeline, in nf-core/rnafusion/bin/ + """ + check_samplesheet.py \\ + $samplesheet \\ + samplesheet.valid.csv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version | sed 's/Python //g') + END_VERSIONS + """ +} diff --git a/modules/local/squid/annotate/main.nf b/modules/local/squid/annotate/main.nf new file mode 100644 index 00000000..585b082d --- /dev/null +++ b/modules/local/squid/annotate/main.nf @@ -0,0 +1,33 @@ + +process SQUID_ANNOTATE { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::squid=1.5" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'docker.io/nfcore/rnafusion:squid_1.5-star2.7.1a' : + 'docker.io/nfcore/rnafusion:squid_1.5-star2.7.1a' }" + + + + input: + tuple val(meta), path(txt) + path gtf + + output: + tuple val(meta), path("*annotated.txt") , emit: fusions_annotated + path "versions.yml" , emit: versions + + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + AnnotateSQUIDOutput.py $gtf $txt ${prefix}.squid.fusions.annotated.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + squid: \$(echo \$(squid --version 2>&1) | sed 's/v//') + END_VERSIONS + """ +} diff --git a/modules/local/squid/annotate/meta.yml b/modules/local/squid/annotate/meta.yml new file mode 100644 index 00000000..e1a1f0d2 --- /dev/null +++ b/modules/local/squid/annotate/meta.yml @@ -0,0 +1,36 @@ +name: squid +description: Squid detection of fusions. +keywords: + - fusion + - pizzly +tools: + - pizzly: + description: Fusion detection using squid + homepage: https://github.com/Kingsford-Group/squid + documentation: https://github.com/Kingsford-Group/squid + tool_dev_url: https://github.com/Kingsford-Group/squid + doi: "" + licence: ["BSD-3-Clause"] + +input: + - fusions: + type: directory + description: Path to squid fusions + pattern: "*.txt" + - gtf: + type: file + description: gtf reference + pattern: "*.gtf" + +output: + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - fusions_annotated: + type: file + description: squid fusions annotated + pattern: "*squid.fusions.annotated.txt" + +authors: + - "@rannick" diff --git a/modules/local/squid/detect/main.nf b/modules/local/squid/detect/main.nf new file mode 100644 index 00000000..80106c0f --- /dev/null +++ b/modules/local/squid/detect/main.nf @@ -0,0 +1,32 @@ + +process SQUID { + tag "squid" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::squid=1.5" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'docker.io/nfcore/rnafusion:squid_1.5-star2.7.1a' : + 'docker.io/nfcore/rnafusion:squid_1.5-star2.7.1a' }" + + + + input: + tuple val(meta), path(bam), path(chimeric_bam) + + output: + tuple val(meta), path("*sv.txt") , emit: fusions + path "versions.yml" , emit: versions + + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + squid -b $bam -c $chimeric_bam -o ${prefix}.squid.fusions + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + squid: \$(echo \$(squid --version 2>&1) | sed 's/v//') + END_VERSIONS + """ +} diff --git a/modules/local/squid/detect/meta.yml b/modules/local/squid/detect/meta.yml new file mode 100644 index 00000000..a7f1e61a --- /dev/null +++ b/modules/local/squid/detect/meta.yml @@ -0,0 +1,41 @@ +name: squid +description: Squid detection of fusions. +keywords: + - fusion + - pizzly +tools: + - pizzly: + description: Fusion detection using squid + homepage: https://github.com/Kingsford-Group/squid + documentation: https://github.com/Kingsford-Group/squid + tool_dev_url: https://github.com/Kingsford-Group/squid + doi: "" + licence: ["BSD-3-Clause"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: BAM/CRAM/SAM file + pattern: "*.{bam,cram,sam}" + - chimeric_bam: + type: file + description: BAM/CRAM/SAM file containing only chimeric sorted reads + pattern: "*.{bam,cram,sam}" + +output: + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - fusions: + type: directory + description: Path to squid fusions + pattern: "*.txt" + +authors: + - "@rannick" diff --git a/modules/local/starfusion/build/main.nf b/modules/local/starfusion/build/main.nf new file mode 100644 index 00000000..09e787a0 --- /dev/null +++ b/modules/local/starfusion/build/main.nf @@ -0,0 +1,41 @@ +process STARFUSION_BUILD { + tag 'star-fusion' + + conda (params.enable_conda ? "bioconda::dfam=3.3 bioconda::hmmer=3.3.2 bioconda::star-fusion=1.10.0 bioconda::trinity=date.2011_11_2 bioconda::samtools=1.9 bioconda::star=2.7.8a" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "docker.io/trinityctat/starfusion:1.10.1" + } else { + container "docker.io/trinityctat/starfusion:1.10.1" + } + + input: + path fasta + path gtf + + output: + path "*" , emit: reference + + script: + def binPath = ( params.enable_conda ? "prep_genome_lib.pl" : "/usr/local/src/STAR-Fusion/ctat-genome-lib-builder/prep_genome_lib.pl" ) + """ + export TMPDIR=/tmp + wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam34.0/Pfam-A.hmm.gz --no-check-certificate + wget https://github.com/FusionAnnotator/CTAT_HumanFusionLib/releases/download/v0.3.0/fusion_lib.Mar2021.dat.gz -O CTAT_HumanFusionLib_Mar2021.dat.gz --no-check-certificate + wget https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/AnnotFilterRule.pm -O AnnotFilterRule.pm --no-check-certificate + wget https://www.dfam.org/releases/Dfam_3.4/infrastructure/dfamscan/homo_sapiens_dfam.hmm --no-check-certificate + wget https://www.dfam.org/releases/Dfam_3.4/infrastructure/dfamscan/homo_sapiens_dfam.hmm.h3f --no-check-certificate + wget https://www.dfam.org/releases/Dfam_3.4/infrastructure/dfamscan/homo_sapiens_dfam.hmm.h3i --no-check-certificate + wget https://www.dfam.org/releases/Dfam_3.4/infrastructure/dfamscan/homo_sapiens_dfam.hmm.h3m --no-check-certificate + wget https://www.dfam.org/releases/Dfam_3.4/infrastructure/dfamscan/homo_sapiens_dfam.hmm.h3p --no-check-certificate + gunzip Pfam-A.hmm.gz && hmmpress Pfam-A.hmm + $binPath \\ + --genome_fa $fasta \\ + --gtf $gtf \\ + --annot_filter_rule AnnotFilterRule.pm \\ + --fusion_annot_lib CTAT_HumanFusionLib_Mar2021.dat.gz \\ + --pfam_db Pfam-A.hmm \\ + --dfam_db homo_sapiens_dfam.hmm \\ + --max_readlength $params.read_length \\ + --CPU $task.cpus + """ +} diff --git a/modules/local/starfusion/build/meta.yml b/modules/local/starfusion/build/meta.yml new file mode 100644 index 00000000..c87b251b --- /dev/null +++ b/modules/local/starfusion/build/meta.yml @@ -0,0 +1,31 @@ +name: starfusion_downloadgenome +description: Download STAR-fusion genome resource required to run STAR-Fusion caller +keywords: + - downoad +tools: + - star-fusion: + description: Fusion calling algorithm for RNAseq data + homepage: https://github.com/STAR-Fusion/ + documentation: https://github.com/STAR-Fusion/STAR-Fusion/wiki/installing-star-fusion + tool_dev_url: https://github.com/STAR-Fusion/STAR-Fusion + doi: "10.1186/s13059-019-1842-9" + licence: ["GPL v3"] + +input: + - fasta: + type: file + description: genome fasta file + pattern: "*.{fasta}" + - gtf: + type: file + description: genome gtf file + pattern: "*.{gtf}" + +output: + - reference: + type: directory + description: Reference dir + pattern: "ctat_genome_lib_build_dir" + +authors: + - "@praveenraj2018" diff --git a/modules/local/starfusion/detect/main.nf b/modules/local/starfusion/detect/main.nf new file mode 100644 index 00000000..cb69f01e --- /dev/null +++ b/modules/local/starfusion/detect/main.nf @@ -0,0 +1,56 @@ +process STARFUSION { + tag "$meta.id" + label 'process_high' + + conda (params.enable_conda ? "bioconda::dfam=3.3 bioconda::hmmer=3.3.2 bioconda::star-fusion=1.10.0 bioconda::trinity=date.2011_11_2 bioconda::samtools=1.9 bioconda::star=2.7.8a" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "docker.io/trinityctat/starfusion:1.10.1" + } else { + container "docker.io/trinityctat/starfusion:1.10.1" + } + + input: + tuple val(meta), path(reads), path(junction) + path reference + + output: + tuple val(meta), path("*.fusion_predictions.tsv") , emit: fusions + tuple val(meta), path("*.abridged.tsv") , emit: abridged + tuple val(meta), path("*.coding_effect.tsv") , optional: true , emit: coding_effect + path "versions.yml" , emit: versions + + script: + def prefix = task.ext.prefix ?: "${meta.id}" + def fasta = meta.single_end ? "--left_fq ${reads[0]}" : "--left_fq ${reads[0]} --right_fq ${reads[1]}" + def args = task.ext.args ?: '' + """ + STAR-Fusion \\ + --genome_lib_dir $reference \\ + $fasta \\ + -J $junction \\ + --CPU $task.cpus \\ + --examine_coding_effect \\ + --output_dir . \\ + $args + + mv star-fusion.fusion_predictions.tsv ${prefix}.starfusion.fusion_predictions.tsv + mv star-fusion.fusion_predictions.abridged.tsv ${prefix}.starfusion.abridged.tsv + mv star-fusion.fusion_predictions.abridged.coding_effect.tsv ${prefix}.starfusion.abridged.coding_effect.tsv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + STAR-Fusion: \$(echo STAR-Fusion --version 2>&1 | grep -i 'version' | sed 's/STAR-Fusion version: //') + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + """ + touch ${prefix}.starfusion.fusion_predictions.tsv + touch ${prefix}.starfusion.abridged.tsv + touch ${prefix}.starfusion.abridged.coding_effect.tsv + touch versions.yml + """ +} + + diff --git a/modules/local/starfusion/detect/meta.yml b/modules/local/starfusion/detect/meta.yml new file mode 100644 index 00000000..7337dad5 --- /dev/null +++ b/modules/local/starfusion/detect/meta.yml @@ -0,0 +1,56 @@ +name: starfusion +description: Fast and Accurate Fusion Transcript Detection from RNA-Seq +keywords: + - Fusion +tools: + - star-fusion: + description: Fast and Accurate Fusion Transcript Detection from RNA-Seq + homepage: https://github.com/STAR-Fusion/STAR-Fusion + documentation: https://github.com/STAR-Fusion/STAR-Fusion/wiki + tool_dev_url: https://github.com/STAR-Fusion/STAR-Fusion/releases + doi: "10.1101/120295v1" + licence: ["GPL v3"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - genome_lib: + type: path + description: STAR-fusion reference genome lib folder + - junction: + type: file + description: Chimeric junction output from STAR aligner + pattern: "*.{out.junction}" + - reference: + type: directory + description: Reference dir + pattern: "ctat_genome_lib_build_dir" + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - version: + type: file + description: File containing software version + pattern: "*.{versions.yml}" + - fusions: + type: file + description: Fusion events from STAR-fusion + pattern: "*.{fusion_predictions.tsv}" + - abridged: + type: file + description: Fusion events from STAR-fusion + pattern: "*.{fusion.abridged.tsv}" + - coding_effect: + type: file + description: Fusion events from STAR-fusion + pattern: "*.{coding_effect.tsv}" + +authors: + - "@praveenraj2018" diff --git a/modules/local/starfusion/download/main.nf b/modules/local/starfusion/download/main.nf new file mode 100644 index 00000000..88fb1312 --- /dev/null +++ b/modules/local/starfusion/download/main.nf @@ -0,0 +1,26 @@ +process STARFUSION_DOWNLOAD { + tag 'star-fusion' + + conda (params.enable_conda ? "bioconda::dfam=3.3 bioconda::hmmer=3.3.2 bioconda::star-fusion=1.10.0 bioconda::trinity=date.2011_11_2 bioconda::samtools=1.9 bioconda::star=2.7.8a" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "docker.io/trinityctat/starfusion:1.10.1" + } else { + container "docker.io/trinityctat/starfusion:1.10.1" + } + + output: + path "ctat_genome_lib_build_dir/*" , emit: reference + path "ctat_genome_lib_build_dir/ref_annot.gtf", emit: chrgtf + + + script: + """ + wget https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/__genome_libs_StarFv1.10/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play.tar.gz --no-check-certificate + + tar xvf GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play.tar.gz + + rm GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play.tar.gz + + mv */ctat_genome_lib_build_dir . + """ +} diff --git a/modules/local/starfusion/download/meta.yml b/modules/local/starfusion/download/meta.yml new file mode 100644 index 00000000..24e84252 --- /dev/null +++ b/modules/local/starfusion/download/meta.yml @@ -0,0 +1,25 @@ +name: starfusion_downloadgenome +description: Download STAR-fusion genome resource required to run STAR-Fusion caller +keywords: + - downoad +tools: + - star-fusion: + description: Fusion calling algorithm for RNAseq data + homepage: https://github.com/STAR-Fusion/ + documentation: https://github.com/STAR-Fusion/STAR-Fusion/wiki/installing-star-fusion + tool_dev_url: https://github.com/STAR-Fusion/STAR-Fusion + doi: "10.1186/s13059-019-1842-9" + licence: ["GPL v3"] + +output: + - reference: + type: directory + description: Genome resource path + pattern: "star-fusion-genome" + - gtf: + type: file + description: genome gtf file + pattern: "*.{gtf}" + +authors: + - "@praveenraj2018,@rannick" diff --git a/modules/local/uscs/custom_gtftogenepred/main.nf b/modules/local/uscs/custom_gtftogenepred/main.nf new file mode 100644 index 00000000..74e4c6f3 --- /dev/null +++ b/modules/local/uscs/custom_gtftogenepred/main.nf @@ -0,0 +1,27 @@ +process GTF_TO_REFFLAT { + label 'process_low' + + conda (params.enable_conda ? "bioconda::ucsc-gtftogenepred=377" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/ucsc-gtftogenepred:377--ha8a8165_5' : + 'quay.io/biocontainers/ucsc-gtftogenepred:377--ha8a8165_5' }" + + input: + path gtf + + output: + path('*.refflat'), emit: refflat + + script: + def genepred = gtf + '.genepred' + def refflat = gtf + '.refflat' + """ + gtfToGenePred -genePredExt -geneNameAsName2 ${gtf} ${genepred} + paste ${genepred} ${genepred} | cut -f12,16-25 > ${refflat} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gtfToGenePred: 377 + END_VERSIONS + """ +} diff --git a/modules/local/uscs/custom_gtftogenepred/meta.yml b/modules/local/uscs/custom_gtftogenepred/meta.yml new file mode 100644 index 00000000..09711f43 --- /dev/null +++ b/modules/local/uscs/custom_gtftogenepred/meta.yml @@ -0,0 +1,34 @@ +name: gtf_to_refflat +description: generate gene annotations in refFlat form + - gtftorefflat +tools: + - gtf_to_refflat: + description: generate gene annotations in refFlat form + homepage: https://pachterlab.github.io/kallisto/ + documentation: https://pachterlab.github.io/kallisto/manual + tool_dev_url: https://github.com/pachterlab/kallisto + doi: "" + licence: ["BSD-2-Clause"] + +input: + - fasta: + type: file + description: genome fasta file + pattern: "*.{fasta*}" + - reference: + type: directory + description: Path to kallisto index + pattern: "*" + +output: + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - fusions: + type: file + description: fusions + pattern: "*.txt" + +authors: + - "@rannick" diff --git a/modules/nf-core/modules/arriba/main.nf b/modules/nf-core/modules/arriba/main.nf new file mode 100644 index 00000000..b7883acd --- /dev/null +++ b/modules/nf-core/modules/arriba/main.nf @@ -0,0 +1,66 @@ +process ARRIBA { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::arriba=2.2.1" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/arriba:2.2.1--hecb563c_2' : + 'quay.io/biocontainers/arriba:2.2.1--hecb563c_2' }" + + input: + tuple val(meta), path(bam) + path fasta + path gtf + path blacklist + path known_fusions + path structural_variants + path tags + path protein_domains + + output: + tuple val(meta), path("*.fusions.tsv") , emit: fusions + tuple val(meta), path("*.fusions.discarded.tsv"), emit: fusions_fail + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def blacklist = blacklist ? "-b $blacklist" : "-f blacklist" + def known_fusions = known_fusions ? "-k $known_fusions" : "" + def structural_variants = structural_variants ? "-d $structual_variants" : "" + def tags = tags ? "-t $tags" : "" + def protein_domains = protein_domains ? "-p $protein_domains" : "" + + """ + arriba \\ + -x $bam \\ + -a $fasta \\ + -g $gtf \\ + -o ${prefix}.fusions.tsv \\ + -O ${prefix}.fusions.discarded.tsv \\ + $blacklist \\ + $known_fusions \\ + $structural_variants \\ + $tags \\ + $protein_domains \\ + $args + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + arriba: \$(arriba -h | grep 'Version:' 2>&1 | sed 's/Version:\s//') + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + """ + echo stub > ${prefix}.fusions.tsv + echo stub > ${prefix}.fusions.discarded.tsv + + echo "${task.process}:" > versions.yml + echo ' arriba: 2.2.1' >> versions.yml + """ +} diff --git a/modules/nf-core/modules/arriba/meta.yml b/modules/nf-core/modules/arriba/meta.yml new file mode 100644 index 00000000..119dd912 --- /dev/null +++ b/modules/nf-core/modules/arriba/meta.yml @@ -0,0 +1,74 @@ +name: arriba +description: Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data. +keywords: + - fusion + - arriba +tools: + - arriba: + description: Fast and accurate gene fusion detection from RNA-Seq data + homepage: https://github.com/suhrig/arriba + documentation: https://arriba.readthedocs.io/en/latest/ + tool_dev_url: https://github.com/suhrig/arriba + doi: "10.1101/gr.257246.119" + licence: ["MIT"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: BAM/CRAM/SAM file + pattern: "*.{bam,cram,sam}" + - fasta: + type: file + description: Assembly FASTA file + pattern: "*.{fasta}" + - gtf: + type: file + description: Annotation GTF file + pattern: "*.{gtf}" + - blacklist: + type: file + description: Blacklist file + pattern: "*.{tsv}" + - known_fusions: + type: file + description: Known fusions file + pattern: "*.{tsv}" + - structural_variants: + type: file + description: Structural variants file + pattern: "*.{tsv}" + - tags: + type: file + description: Tags file + pattern: "*.{tsv}" + - protein_domains: + type: file + description: Protein domains file + pattern: "*.{gff3}" + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - fusions: + type: file + description: File contains fusions which pass all of Arriba's filters. + pattern: "*.{fusions.tsv}" + - fusions_fail: + type: file + description: File contains fusions that Arriba classified as an artifact or that are also observed in healthy tissue. + pattern: "*.{fusions.discarded.tsv}" + +authors: + - "@praveenraj2018,@rannick" diff --git a/modules/nf-core/modules/cat/fastq/main.nf b/modules/nf-core/modules/cat/fastq/main.nf new file mode 100644 index 00000000..b6854895 --- /dev/null +++ b/modules/nf-core/modules/cat/fastq/main.nf @@ -0,0 +1,51 @@ +process CAT_FASTQ { + tag "$meta.id" + label 'process_low' + + conda (params.enable_conda ? "conda-forge::sed=4.7" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/ubuntu:20.04' : + 'ubuntu:20.04' }" + + input: + tuple val(meta), path(reads, stageAs: "input*/*") + + output: + tuple val(meta), path("*.merged.fastq.gz"), emit: reads + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def readList = reads.collect{ it.toString() } + if (meta.single_end) { + if (readList.size > 1) { + """ + cat ${readList.join(' ')} > ${prefix}.merged.fastq.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//') + END_VERSIONS + """ + } + } else { + if (readList.size > 2) { + def read1 = [] + def read2 = [] + readList.eachWithIndex{ v, ix -> ( ix & 1 ? read2 : read1 ) << v } + """ + cat ${read1.join(' ')} > ${prefix}_1.merged.fastq.gz + cat ${read2.join(' ')} > ${prefix}_2.merged.fastq.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//') + END_VERSIONS + """ + } + } +} diff --git a/modules/nf-core/modules/cat/fastq/meta.yml b/modules/nf-core/modules/cat/fastq/meta.yml new file mode 100644 index 00000000..c836598e --- /dev/null +++ b/modules/nf-core/modules/cat/fastq/meta.yml @@ -0,0 +1,39 @@ +name: cat_fastq +description: Concatenates fastq files +keywords: + - fastq + - concatenate +tools: + - cat: + description: | + The cat utility reads files sequentially, writing them to the standard output. + documentation: https://www.gnu.org/software/coreutils/manual/html_node/cat-invocation.html + licence: ["GPL-3.0-or-later"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: list + description: | + List of input FastQ files to be concatenated. +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: Merged fastq file + pattern: "*.{merged.fastq.gz}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + +authors: + - "@joseespinosa" + - "@drpatelh" diff --git a/modules/nf-core/modules/custom/dumpsoftwareversions/main.nf b/modules/nf-core/modules/custom/dumpsoftwareversions/main.nf new file mode 100644 index 00000000..327d5100 --- /dev/null +++ b/modules/nf-core/modules/custom/dumpsoftwareversions/main.nf @@ -0,0 +1,24 @@ +process CUSTOM_DUMPSOFTWAREVERSIONS { + label 'process_low' + + // Requires `pyyaml` which does not have a dedicated container but is in the MultiQC container + conda (params.enable_conda ? "bioconda::multiqc=1.11" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/multiqc:1.11--pyhdfd78af_0' : + 'quay.io/biocontainers/multiqc:1.11--pyhdfd78af_0' }" + + input: + path versions + + output: + path "software_versions.yml" , emit: yml + path "software_versions_mqc.yml", emit: mqc_yml + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + template 'dumpsoftwareversions.py' +} diff --git a/modules/nf-core/modules/custom/dumpsoftwareversions/meta.yml b/modules/nf-core/modules/custom/dumpsoftwareversions/meta.yml new file mode 100644 index 00000000..60b546a0 --- /dev/null +++ b/modules/nf-core/modules/custom/dumpsoftwareversions/meta.yml @@ -0,0 +1,34 @@ +name: custom_dumpsoftwareversions +description: Custom module used to dump software versions within the nf-core pipeline template +keywords: + - custom + - version +tools: + - custom: + description: Custom module used to dump software versions within the nf-core pipeline template + homepage: https://github.com/nf-core/tools + documentation: https://github.com/nf-core/tools + licence: ["MIT"] +input: + - versions: + type: file + description: YML file containing software versions + pattern: "*.yml" + +output: + - yml: + type: file + description: Standard YML file containing software versions + pattern: "software_versions.yml" + - mqc_yml: + type: file + description: MultiQC custom content YML file containing software versions + pattern: "software_versions_mqc.yml" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + +authors: + - "@drpatelh" + - "@grst" diff --git a/modules/nf-core/modules/custom/dumpsoftwareversions/templates/dumpsoftwareversions.py b/modules/nf-core/modules/custom/dumpsoftwareversions/templates/dumpsoftwareversions.py new file mode 100644 index 00000000..d1390392 --- /dev/null +++ b/modules/nf-core/modules/custom/dumpsoftwareversions/templates/dumpsoftwareversions.py @@ -0,0 +1,89 @@ +#!/usr/bin/env python + +import yaml +import platform +from textwrap import dedent + + +def _make_versions_html(versions): + html = [ + dedent( + """\\ + + + + + + + + + + """ + ) + ] + for process, tmp_versions in sorted(versions.items()): + html.append("") + for i, (tool, version) in enumerate(sorted(tmp_versions.items())): + html.append( + dedent( + f"""\\ + + + + + + """ + ) + ) + html.append("") + html.append("
Process Name Software Version
{process if (i == 0) else ''}{tool}{version}
") + return "\\n".join(html) + + +versions_this_module = {} +versions_this_module["${task.process}"] = { + "python": platform.python_version(), + "yaml": yaml.__version__, +} + +with open("$versions") as f: + versions_by_process = yaml.load(f, Loader=yaml.BaseLoader) | versions_this_module + +# aggregate versions by the module name (derived from fully-qualified process name) +versions_by_module = {} +for process, process_versions in versions_by_process.items(): + module = process.split(":")[-1] + try: + assert versions_by_module[module] == process_versions, ( + "We assume that software versions are the same between all modules. " + "If you see this error-message it means you discovered an edge-case " + "and should open an issue in nf-core/tools. " + ) + except KeyError: + versions_by_module[module] = process_versions + +versions_by_module["Workflow"] = { + "Nextflow": "$workflow.nextflow.version", + "$workflow.manifest.name": "$workflow.manifest.version", +} + +versions_mqc = { + "id": "software_versions", + "section_name": "${workflow.manifest.name} Software Versions", + "section_href": "https://github.com/${workflow.manifest.name}", + "plot_type": "html", + "description": "are collected at run time from the software output.", + "data": _make_versions_html(versions_by_module), +} + +with open("software_versions.yml", "w") as f: + yaml.dump(versions_by_module, f, default_flow_style=False) +with open("software_versions_mqc.yml", "w") as f: + yaml.dump(versions_mqc, f, default_flow_style=False) + +with open("versions.yml", "w") as f: + yaml.dump(versions_this_module, f, default_flow_style=False) diff --git a/modules/nf-core/modules/fastqc/main.nf b/modules/nf-core/modules/fastqc/main.nf new file mode 100644 index 00000000..05730368 --- /dev/null +++ b/modules/nf-core/modules/fastqc/main.nf @@ -0,0 +1,59 @@ +process FASTQC { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::fastqc=0.11.9" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0' : + 'quay.io/biocontainers/fastqc:0.11.9--0' }" + + input: + tuple val(meta), path(reads) + + output: + tuple val(meta), path("*.html"), emit: html + tuple val(meta), path("*.zip") , emit: zip + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + // Add soft-links to original FastQs for consistent naming in pipeline + def prefix = task.ext.prefix ?: "${meta.id}" + if (meta.single_end) { + """ + [ ! -f ${prefix}.fastq.gz ] && ln -s $reads ${prefix}.fastq.gz + fastqc $args --threads $task.cpus ${prefix}.fastq.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" ) + END_VERSIONS + """ + } else { + """ + [ ! -f ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz + [ ! -f ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz + fastqc $args --threads $task.cpus ${prefix}_1.fastq.gz ${prefix}_2.fastq.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" ) + END_VERSIONS + """ + } + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + """ + touch ${prefix}.html + touch ${prefix}.zip + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" ) + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/fastqc/meta.yml b/modules/nf-core/modules/fastqc/meta.yml new file mode 100644 index 00000000..4da5bb5a --- /dev/null +++ b/modules/nf-core/modules/fastqc/meta.yml @@ -0,0 +1,52 @@ +name: fastqc +description: Run FastQC on sequenced reads +keywords: + - quality control + - qc + - adapters + - fastq +tools: + - fastqc: + description: | + FastQC gives general quality metrics about your reads. + It provides information about the quality score distribution + across your reads, the per base sequence content (%A/C/G/T). + You get information about adapter contamination and other + overrepresented sequences. + homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ + documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/ + licence: ["GPL-2.0-only"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - html: + type: file + description: FastQC report + pattern: "*_{fastqc.html}" + - zip: + type: file + description: FastQC report archive + pattern: "*_{fastqc.zip}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@drpatelh" + - "@grst" + - "@ewels" + - "@FelixKrueger" diff --git a/modules/nf-core/modules/kallisto/index/main.nf b/modules/nf-core/modules/kallisto/index/main.nf new file mode 100644 index 00000000..0f10e564 --- /dev/null +++ b/modules/nf-core/modules/kallisto/index/main.nf @@ -0,0 +1,34 @@ +process KALLISTO_INDEX { + tag "$fasta" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::kallisto=0.46.2" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/kallisto:0.46.2--h4f7b962_1' : + 'quay.io/biocontainers/kallisto:0.46.2--h4f7b962_1' }" + + input: + path fasta + + output: + path "kallisto" , emit: idx + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + """ + kallisto \\ + index \\ + $args \\ + -i kallisto \\ + $fasta + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + kallisto: \$(echo \$(kallisto 2>&1) | sed 's/^kallisto //; s/Usage.*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/kallisto/index/meta.yml b/modules/nf-core/modules/kallisto/index/meta.yml new file mode 100644 index 00000000..307650b2 --- /dev/null +++ b/modules/nf-core/modules/kallisto/index/meta.yml @@ -0,0 +1,31 @@ +name: kallisto_index +description: Create kallisto index +keywords: + - index +tools: + - kallisto: + description: Quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. + homepage: https://pachterlab.github.io/kallisto/ + documentation: https://pachterlab.github.io/kallisto/manual + tool_dev_url: https://github.com/pachterlab/kallisto + doi: "" + licence: ["BSD-2-Clause"] + +input: + - fasta: + type: file + description: genome fasta file + pattern: "*.{fasta}" + +output: + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - idx: + type: index + description: Kallisto genome index + pattern: "*.idx" + +authors: + - "@ggabernet" diff --git a/modules/nf-core/modules/multiqc/main.nf b/modules/nf-core/modules/multiqc/main.nf new file mode 100644 index 00000000..ae019dbf --- /dev/null +++ b/modules/nf-core/modules/multiqc/main.nf @@ -0,0 +1,43 @@ +process MULTIQC { + label 'process_medium' + + conda (params.enable_conda ? 'bioconda::multiqc=1.12' : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/multiqc:1.12--pyhdfd78af_0' : + 'quay.io/biocontainers/multiqc:1.12--pyhdfd78af_0' }" + + input: + path multiqc_files + + output: + path "*multiqc_report.html", emit: report + path "*_data" , emit: data + path "*_plots" , optional:true, emit: plots + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + """ + multiqc -f $args . + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" ) + END_VERSIONS + """ + + stub: + """ + touch multiqc_data + touch multiqc_plots + touch multiqc_report.html + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" ) + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/multiqc/meta.yml b/modules/nf-core/modules/multiqc/meta.yml new file mode 100644 index 00000000..6fa891ef --- /dev/null +++ b/modules/nf-core/modules/multiqc/meta.yml @@ -0,0 +1,40 @@ +name: MultiQC +description: Aggregate results from bioinformatics analyses across many samples into a single report +keywords: + - QC + - bioinformatics tools + - Beautiful stand-alone HTML report +tools: + - multiqc: + description: | + MultiQC searches a given directory for analysis logs and compiles a HTML report. + It's a general use tool, perfect for summarising the output from numerous bioinformatics tools. + homepage: https://multiqc.info/ + documentation: https://multiqc.info/docs/ + licence: ["GPL-3.0-or-later"] +input: + - multiqc_files: + type: file + description: | + List of reports / files recognised by MultiQC, for example the html and zip output of FastQC +output: + - report: + type: file + description: MultiQC report file + pattern: "multiqc_report.html" + - data: + type: dir + description: MultiQC data dir + pattern: "multiqc_data" + - plots: + type: file + description: Plots created by MultiQC + pattern: "*_data" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@abhi18av" + - "@bunop" + - "@drpatelh" diff --git a/modules/nf-core/modules/picard/collectwgsmetrics/main.nf b/modules/nf-core/modules/picard/collectwgsmetrics/main.nf new file mode 100644 index 00000000..e6dd49e9 --- /dev/null +++ b/modules/nf-core/modules/picard/collectwgsmetrics/main.nf @@ -0,0 +1,45 @@ +process PICARD_COLLECTWGSMETRICS { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::picard=2.27.1" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/picard:2.27.1--hdfd78af_0' : + 'quay.io/biocontainers/picard:2.27.1--hdfd78af_0' }" + + input: + tuple val(meta), path(bam) + path fasta + + output: + tuple val(meta), path("*_metrics"), emit: metrics + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def avail_mem = 3 + if (!task.memory) { + log.info '[Picard CollectWgsMetrics] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this.' + } else { + avail_mem = task.memory.giga + } + """ + picard \\ + -Xmx${avail_mem}g \\ + CollectWgsMetrics \\ + $args \\ + --INPUT $bam \\ + --OUTPUT ${prefix}.CollectWgsMetrics.coverage_metrics \\ + --REFERENCE_SEQUENCE $fasta + + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + picard: \$(picard CollectWgsMetrics --version 2>&1 | grep -o 'Version.*' | cut -f2- -d:) + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/picard/collectwgsmetrics/meta.yml b/modules/nf-core/modules/picard/collectwgsmetrics/meta.yml new file mode 100644 index 00000000..d6c3d012 --- /dev/null +++ b/modules/nf-core/modules/picard/collectwgsmetrics/meta.yml @@ -0,0 +1,47 @@ +name: picard_collectwgsmetrics +description: Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments. +keywords: + - alignment + - metrics + - statistics + - quality + - bam +tools: + - picard: + description: | + A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) + data and formats such as SAM/BAM/CRAM and VCF. + homepage: https://broadinstitute.github.io/picard/ + documentation: https://broadinstitute.github.io/picard/ + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: BAM file + pattern: "*.{bam}" + - fasta: + type: file + description: Genome fasta file +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - metrics: + type: file + description: Alignment metrics files generated by picard + pattern: "*_{metrics}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@drpatelh" + - "@flowuenne" + - "@lassefolkersen" diff --git a/modules/nf-core/modules/picard/markduplicates/main.nf b/modules/nf-core/modules/picard/markduplicates/main.nf new file mode 100644 index 00000000..1565c647 --- /dev/null +++ b/modules/nf-core/modules/picard/markduplicates/main.nf @@ -0,0 +1,58 @@ +process PICARD_MARKDUPLICATES { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::picard=2.27.1" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/picard:2.27.1--hdfd78af_0' : + 'quay.io/biocontainers/picard:2.27.1--hdfd78af_0' }" + + input: + tuple val(meta), path(bam) + + output: + tuple val(meta), path("*.bam") , emit: bam + tuple val(meta), path("*.bai") , optional:true, emit: bai + tuple val(meta), path("*.metrics.txt"), emit: metrics + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def avail_mem = 3 + if (!task.memory) { + log.info '[Picard MarkDuplicates] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this.' + } else { + avail_mem = task.memory.giga + } + """ + picard \\ + -Xmx${avail_mem}g \\ + MarkDuplicates \\ + $args \\ + --INPUT $bam \\ + --OUTPUT ${prefix}.bam \\ + --METRICS_FILE ${prefix}.MarkDuplicates.metrics.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + picard: \$(echo \$(picard MarkDuplicates --version 2>&1) | grep -o 'Version:.*' | cut -f2- -d:) + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + """ + touch ${prefix}.bam + touch ${prefix}.bam.bai + touch ${prefix}.MarkDuplicates.metrics.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + picard: \$(echo \$(picard MarkDuplicates --version 2>&1) | grep -o 'Version:.*' | cut -f2- -d:) + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/picard/markduplicates/meta.yml b/modules/nf-core/modules/picard/markduplicates/meta.yml new file mode 100644 index 00000000..842817bc --- /dev/null +++ b/modules/nf-core/modules/picard/markduplicates/meta.yml @@ -0,0 +1,52 @@ +name: picard_markduplicates +description: Locate and tag duplicate reads in a BAM file +keywords: + - markduplicates + - pcr + - duplicates + - bam + - sam + - cram +tools: + - picard: + description: | + A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) + data and formats such as SAM/BAM/CRAM and VCF. + homepage: https://broadinstitute.github.io/picard/ + documentation: https://broadinstitute.github.io/picard/ + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: BAM file + pattern: "*.{bam}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: BAM file with duplicate reads marked/removed + pattern: "*.{bam}" + - bai: + type: file + description: An optional BAM index file. If desired, --CREATE_INDEX must be passed as a flag + pattern: "*.{bai}" + - metrics: + type: file + description: Duplicate metrics file generated by picard + pattern: "*.{metrics.txt}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@drpatelh" + - "@projectoriented" diff --git a/modules/nf-core/modules/qualimap/rnaseq/main.nf b/modules/nf-core/modules/qualimap/rnaseq/main.nf new file mode 100644 index 00000000..3b2f88ad --- /dev/null +++ b/modules/nf-core/modules/qualimap/rnaseq/main.nf @@ -0,0 +1,52 @@ +process QUALIMAP_RNASEQ { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::qualimap=2.2.2d" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/qualimap:2.2.2d--1' : + 'quay.io/biocontainers/qualimap:2.2.2d--1' }" + + input: + tuple val(meta), path(bam) + path gtf + + output: + tuple val(meta), path("${prefix}"), emit: results + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + prefix = task.ext.prefix ?: "${meta.id}" + def paired_end = meta.single_end ? '' : '-pe' + def memory = task.memory.toGiga() + "G" + + def strandedness = 'non-strand-specific' + if (meta.strandedness == 'forward') { + strandedness = 'strand-specific-forward' + } else if (meta.strandedness == 'reverse') { + strandedness = 'strand-specific-reverse' + } + """ + unset DISPLAY + mkdir tmp + export _JAVA_OPTIONS=-Djava.io.tmpdir=./tmp + qualimap \\ + --java-mem-size=$memory \\ + rnaseq \\ + $args \\ + -bam $bam \\ + -gtf $gtf \\ + -p $strandedness \\ + $paired_end \\ + -outdir $prefix + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + qualimap: \$(echo \$(qualimap 2>&1) | sed 's/^.*QualiMap v.//; s/Built.*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/samtools/faidx/main.nf b/modules/nf-core/modules/samtools/faidx/main.nf new file mode 100644 index 00000000..fdce7d9b --- /dev/null +++ b/modules/nf-core/modules/samtools/faidx/main.nf @@ -0,0 +1,42 @@ +process SAMTOOLS_FAIDX { + tag "$fasta" + label 'process_low' + + conda (params.enable_conda ? "bioconda::samtools=1.15.1" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/samtools:1.15.1--h1170115_0' : + 'quay.io/biocontainers/samtools:1.15.1--h1170115_0' }" + + input: + tuple val(meta), path(fasta) + + output: + tuple val(meta), path ("*.fai"), emit: fai + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + """ + samtools \\ + faidx \\ + $fasta + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ + + stub: + """ + touch ${fasta}.fai + cat <<-END_VERSIONS > versions.yml + + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/samtools/faidx/meta.yml b/modules/nf-core/modules/samtools/faidx/meta.yml new file mode 100644 index 00000000..e9767764 --- /dev/null +++ b/modules/nf-core/modules/samtools/faidx/meta.yml @@ -0,0 +1,43 @@ +name: samtools_faidx +description: Index FASTA file +keywords: + - index + - fasta +tools: + - samtools: + description: | + SAMtools is a set of utilities for interacting with and post-processing + short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. + These files are generated as output by short read aligners like BWA. + homepage: http://www.htslib.org/ + documentation: http://www.htslib.org/doc/samtools.html + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - fasta: + type: file + description: FASTA file + pattern: "*.{fa,fasta}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - fai: + type: file + description: FASTA index file + pattern: "*.{fai}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@drpatelh" + - "@ewels" + - "@phue" diff --git a/modules/nf-core/modules/samtools/index/main.nf b/modules/nf-core/modules/samtools/index/main.nf new file mode 100644 index 00000000..e04e63e8 --- /dev/null +++ b/modules/nf-core/modules/samtools/index/main.nf @@ -0,0 +1,48 @@ +process SAMTOOLS_INDEX { + tag "$meta.id" + label 'process_low' + + conda (params.enable_conda ? "bioconda::samtools=1.15.1" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/samtools:1.15.1--h1170115_0' : + 'quay.io/biocontainers/samtools:1.15.1--h1170115_0' }" + + input: + tuple val(meta), path(input) + + output: + tuple val(meta), path("*.bai") , optional:true, emit: bai + tuple val(meta), path("*.csi") , optional:true, emit: csi + tuple val(meta), path("*.crai"), optional:true, emit: crai + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + """ + samtools \\ + index \\ + -@ ${task.cpus-1} \\ + $args \\ + $input + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ + + stub: + """ + touch ${input}.bai + touch ${input}.crai + touch ${input}.csi + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/samtools/index/meta.yml b/modules/nf-core/modules/samtools/index/meta.yml new file mode 100644 index 00000000..e5cadbc2 --- /dev/null +++ b/modules/nf-core/modules/samtools/index/meta.yml @@ -0,0 +1,53 @@ +name: samtools_index +description: Index SAM/BAM/CRAM file +keywords: + - index + - bam + - sam + - cram +tools: + - samtools: + description: | + SAMtools is a set of utilities for interacting with and post-processing + short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. + These files are generated as output by short read aligners like BWA. + homepage: http://www.htslib.org/ + documentation: hhttp://www.htslib.org/doc/samtools.html + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: BAM/CRAM/SAM file + pattern: "*.{bam,cram,sam}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bai: + type: file + description: BAM/CRAM/SAM index file + pattern: "*.{bai,crai,sai}" + - crai: + type: file + description: BAM/CRAM/SAM index file + pattern: "*.{bai,crai,sai}" + - csi: + type: file + description: CSI index file + pattern: "*.{csi}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@drpatelh" + - "@ewels" + - "@maxulysse" diff --git a/modules/nf-core/modules/samtools/sort/main.nf b/modules/nf-core/modules/samtools/sort/main.nf new file mode 100644 index 00000000..b4fc1cbe --- /dev/null +++ b/modules/nf-core/modules/samtools/sort/main.nf @@ -0,0 +1,42 @@ +process SAMTOOLS_SORT { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::samtools=1.15.1" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/samtools:1.15.1--h1170115_0' : + 'quay.io/biocontainers/samtools:1.15.1--h1170115_0' }" + + input: + tuple val(meta), path(bam) + + output: + tuple val(meta), path("*.bam"), emit: bam + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + if ("$bam" == "${prefix}.bam") error "Input and output names are the same, use \"task.ext.prefix\" to disambiguate!" + """ + samtools sort $args -@ $task.cpus -o ${prefix}.bam -T $prefix $bam + cat <<-END_VERSIONS > versions.yml + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + """ + touch ${prefix}.bam + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/samtools/sort/meta.yml b/modules/nf-core/modules/samtools/sort/meta.yml new file mode 100644 index 00000000..a820c55a --- /dev/null +++ b/modules/nf-core/modules/samtools/sort/meta.yml @@ -0,0 +1,44 @@ +name: samtools_sort +description: Sort SAM/BAM/CRAM file +keywords: + - sort + - bam + - sam + - cram +tools: + - samtools: + description: | + SAMtools is a set of utilities for interacting with and post-processing + short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. + These files are generated as output by short read aligners like BWA. + homepage: http://www.htslib.org/ + documentation: hhttp://www.htslib.org/doc/samtools.html + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: BAM/CRAM/SAM file + pattern: "*.{bam,cram,sam}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: Sorted BAM/CRAM/SAM file + pattern: "*.{bam,cram,sam}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@drpatelh" + - "@ewels" diff --git a/modules/nf-core/modules/samtools/view/main.nf b/modules/nf-core/modules/samtools/view/main.nf new file mode 100644 index 00000000..55194e88 --- /dev/null +++ b/modules/nf-core/modules/samtools/view/main.nf @@ -0,0 +1,56 @@ +process SAMTOOLS_VIEW { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::samtools=1.15.1" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/samtools:1.15.1--h1170115_0' : + 'quay.io/biocontainers/samtools:1.15.1--h1170115_0' }" + + input: + tuple val(meta), path(input), path(index) + path fasta + + output: + tuple val(meta), path("*.bam") , emit: bam , optional: true + tuple val(meta), path("*.cram"), emit: cram, optional: true + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def args2 = task.ext.args2 ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def reference = fasta ? "--reference ${fasta} -C" : "" + def file_type = input.getExtension() + if ("$input" == "${prefix}.${file_type}") error "Input and output names are the same, use \"task.ext.prefix\" to disambiguate!" + """ + samtools \\ + view \\ + --threads ${task.cpus-1} \\ + ${reference} \\ + $args \\ + $input \\ + $args2 \\ + > ${prefix}.${file_type} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + """ + touch ${prefix}.bam + touch ${prefix}.cram + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/samtools/view/meta.yml b/modules/nf-core/modules/samtools/view/meta.yml new file mode 100644 index 00000000..a8b43ecc --- /dev/null +++ b/modules/nf-core/modules/samtools/view/meta.yml @@ -0,0 +1,57 @@ +name: samtools_view +description: filter/convert SAM/BAM/CRAM file +keywords: + - view + - bam + - sam + - cram +tools: + - samtools: + description: | + SAMtools is a set of utilities for interacting with and post-processing + short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. + These files are generated as output by short read aligners like BWA. + homepage: http://www.htslib.org/ + documentation: hhttp://www.htslib.org/doc/samtools.html + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - input: + type: file + description: BAM/CRAM/SAM file + pattern: "*.{bam,cram,sam}" + - index: + type: optional file + description: BAM.BAI/CRAM.CRAI file + pattern: "*.{.bai,.crai}" + - fasta: + type: optional file + description: Reference file the CRAM was created with + pattern: "*.{fasta,fa}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: filtered/converted BAM/SAM file + pattern: "*.{bam,sam}" + - cram: + type: file + description: filtered/converted CRAM file + pattern: "*.cram" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@drpatelh" + - "@joseespinosa" + - "@FriederikeHanssen" diff --git a/modules/nf-core/modules/star/align/main.nf b/modules/nf-core/modules/star/align/main.nf new file mode 100644 index 00000000..762b84f6 --- /dev/null +++ b/modules/nf-core/modules/star/align/main.nf @@ -0,0 +1,72 @@ +process STAR_ALIGN { + tag "$meta.id" + label 'process_high' + + // Note: 2.7X indices incompatible with AWS iGenomes. + conda (params.enable_conda ? 'bioconda::star=2.7.9a' : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/star:2.7.9a--h9ee0642_0' : + 'quay.io/biocontainers/star:2.7.9a--h9ee0642_0' }" + + input: + tuple val(meta), path(reads) + path index + path gtf + val star_ignore_sjdbgtf + val seq_platform + val seq_center + + output: + tuple val(meta), path('*d.out.bam') , emit: bam + tuple val(meta), path('*Log.final.out') , emit: log_final + tuple val(meta), path('*Log.out') , emit: log_out + tuple val(meta), path('*Log.progress.out'), emit: log_progress + path "versions.yml" , emit: versions + + tuple val(meta), path('*sortedByCoord.out.bam') , optional:true, emit: bam_sorted + tuple val(meta), path('*toTranscriptome.out.bam'), optional:true, emit: bam_transcript + tuple val(meta), path('*Aligned.unsort.out.bam') , optional:true, emit: bam_unsorted + tuple val(meta), path('*fastq.gz') , optional:true, emit: fastq + tuple val(meta), path('*.tab') , optional:true, emit: tab + tuple val(meta), path('*.out.junction') , optional:true, emit: junction + tuple val(meta), path('*.out.sam') , optional:true, emit: sam + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def ignore_gtf = star_ignore_sjdbgtf ? '' : "--sjdbGTFfile $gtf" + def seq_platform = seq_platform ? "'PL:$seq_platform'" : "" + def seq_center = seq_center ? "--outSAMattrRGline ID:$prefix 'CN:$seq_center' 'SM:$prefix' $seq_platform " : "--outSAMattrRGline ID:$prefix 'SM:$prefix' $seq_platform " + def out_sam_type = (args.contains('--outSAMtype')) ? '' : '--outSAMtype BAM Unsorted' + def mv_unsorted_bam = (args.contains('--outSAMtype BAM Unsorted SortedByCoordinate')) ? "mv ${prefix}.Aligned.out.bam ${prefix}.Aligned.unsort.out.bam" : '' + """ + STAR \\ + --genomeDir $index \\ + --readFilesIn $reads \\ + --runThreadN $task.cpus \\ + --outFileNamePrefix $prefix. \\ + $out_sam_type \\ + $ignore_gtf \\ + $seq_center \\ + $args + + $mv_unsorted_bam + + if [ -f ${prefix}.Unmapped.out.mate1 ]; then + mv ${prefix}.Unmapped.out.mate1 ${prefix}.unmapped_1.fastq + gzip ${prefix}.unmapped_1.fastq + fi + if [ -f ${prefix}.Unmapped.out.mate2 ]; then + mv ${prefix}.Unmapped.out.mate2 ${prefix}.unmapped_2.fastq + gzip ${prefix}.unmapped_2.fastq + fi + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + star: \$(STAR --version | sed -e "s/STAR_//g") + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/star/align/meta.yml b/modules/nf-core/modules/star/align/meta.yml new file mode 100644 index 00000000..7ee10f1c --- /dev/null +++ b/modules/nf-core/modules/star/align/meta.yml @@ -0,0 +1,81 @@ +name: star_align +description: Align reads to a reference genome using STAR +keywords: + - align + - fasta + - genome + - reference +tools: + - star: + description: | + STAR is a software package for mapping DNA sequences against + a large reference genome, such as the human genome. + homepage: https://github.com/alexdobin/STAR + manual: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf + doi: 10.1093/bioinformatics/bts635 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. + - index: + type: directory + description: STAR genome index + pattern: "star" +output: + - bam: + type: file + description: Output BAM file containing read alignments + pattern: "*.{bam}" + - log_final: + type: file + description: STAR final log file + pattern: "*Log.final.out" + - log_out: + type: file + description: STAR lot out file + pattern: "*Log.out" + - log_progress: + type: file + description: STAR log progress file + pattern: "*Log.progress.out" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - bam_sorted: + type: file + description: Sorted BAM file of read alignments (optional) + pattern: "*sortedByCoord.out.bam" + - bam_transcript: + type: file + description: Output BAM file of transcriptome alignment (optional) + pattern: "*toTranscriptome.out.bam" + - bam_unsorted: + type: file + description: Unsorted BAM file of read alignments (optional) + pattern: "*Aligned.unsort.out.bam" + - fastq: + type: file + description: Unmapped FastQ files (optional) + pattern: "*fastq.gz" + - tab: + type: file + description: STAR output tab file(s) (optional) + pattern: "*.tab" + - junction: + type: file + description: STAR chimeric junction output file (optional) + pattern: "*.out.junction" + +authors: + - "@kevinmenden" + - "@drpatelh" + - "@praveenraj2018" diff --git a/modules/nf-core/modules/star/genomegenerate/main.nf b/modules/nf-core/modules/star/genomegenerate/main.nf new file mode 100644 index 00000000..e5568f1d --- /dev/null +++ b/modules/nf-core/modules/star/genomegenerate/main.nf @@ -0,0 +1,69 @@ +process STAR_GENOMEGENERATE { + tag "$fasta" + label 'process_high' + + // Note: 2.7X indices incompatible with AWS iGenomes. + conda (params.enable_conda ? "bioconda::star=2.7.9a bioconda::samtools=1.15.1 conda-forge::gawk=5.1.0" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/mulled-v2-1fa26d1ce03c295fe2fdcf85831a92fbcbd7e8c2:1c4c32d87798d425c970ececfbadd155e7560277-0' : + 'quay.io/biocontainers/mulled-v2-1fa26d1ce03c295fe2fdcf85831a92fbcbd7e8c2:1c4c32d87798d425c970ececfbadd155e7560277-0' }" + + input: + path fasta + path gtf + + output: + path "star" , emit: index + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def args_list = args.tokenize() + def memory = task.memory ? "--limitGenomeGenerateRAM ${task.memory.toBytes() - 100000000}" : '' + if (args_list.contains('--genomeSAindexNbases')) { + """ + mkdir star + STAR \\ + --runMode genomeGenerate \\ + --genomeDir star/ \\ + --genomeFastaFiles $fasta \\ + --sjdbGTFfile $gtf \\ + --runThreadN $task.cpus \\ + $memory \\ + $args + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + star: \$(STAR --version | sed -e "s/STAR_//g") + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//') + END_VERSIONS + """ + } else { + """ + samtools faidx $fasta + NUM_BASES=`gawk '{sum = sum + \$2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (log(sum)/log(2))/2 - 1}}' ${fasta}.fai` + + mkdir star + STAR \\ + --runMode genomeGenerate \\ + --genomeDir star/ \\ + --genomeFastaFiles $fasta \\ + --sjdbGTFfile $gtf \\ + --runThreadN $task.cpus \\ + --genomeSAindexNbases \$NUM_BASES \\ + $memory \\ + $args + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + star: \$(STAR --version | sed -e "s/STAR_//g") + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//') + END_VERSIONS + """ + } +} diff --git a/modules/nf-core/modules/star/genomegenerate/meta.yml b/modules/nf-core/modules/star/genomegenerate/meta.yml new file mode 100644 index 00000000..8181157a --- /dev/null +++ b/modules/nf-core/modules/star/genomegenerate/meta.yml @@ -0,0 +1,37 @@ +name: star_genomegenerate +description: Create index for STAR +keywords: + - index + - fasta + - genome + - reference +tools: + - star: + description: | + STAR is a software package for mapping DNA sequences against + a large reference genome, such as the human genome. + homepage: https://github.com/alexdobin/STAR + manual: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf + doi: 10.1093/bioinformatics/bts635 + licence: ["MIT"] +input: + - fasta: + type: file + description: Fasta file of the reference genome + - gtf: + type: file + description: GTF file of the reference genome + +output: + - index: + type: directory + description: Folder containing the star index files + pattern: "star" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + +authors: + - "@kevinmenden" + - "@drpatelh" diff --git a/nextflow.config b/nextflow.config index 30aa2405..9d8fadc7 100644 --- a/nextflow.config +++ b/nextflow.config @@ -1,197 +1,253 @@ /* - * ------------------------------------------------- - * nf-core/rnafusion Nextflow config file - * ------------------------------------------------- - * Default config options for all environments. - */ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + nf-core/rnafusion Nextflow config file +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Default config options for all compute environments +---------------------------------------------------------------------------------------- +*/ // Global default params, used in configs params { - genome = 'GRCh38' - genomes_base = false - reference_release = '97' - - // Options: Building STAR-star_index - star_index = false - read_length = 100 - - // Fusion tools - arriba = false - star_fusion = false - fusioncatcher = false - fusion_inspector = false - ericscript = false - pizzly = false - squid = false - - // Options: Arriba - arriba_opt = false - arriba_vis = false - - // Options: STAR-Fusion - star_fusion_opt = false - - // Options: FusionCatcher - fusioncatcher_opt = false - - // Options: Pizzly - pizzly_k = 31 - - // Options: Fusion-Inspector - fusion_inspector_opt = false - - // Options: fusion-report - fusion_report_opt = false - - // Defaults - reads = "data/*{1,2}.fastq.gz" - single_end = false - clusterOptions = false - awsqueue = false - awsregion = 'eu-west-1' - readPaths = null - debug = false - - // Options: download-references.nf - base = false - download_all = false - fusion_report = false - cosmic_usr = false - cosmic_passwd = false - - // Shared default variables across different scripts - outdir = './results' - tracedir = "${params.outdir}/pipeline_info" - - // Boilerplate options - name = false - multiqc_config = false - email = false - email_on_fail = false - max_multiqc_email_size = 25.MB - plaintext_email = false - monochrome_logs = false - help = false - tracedir = "${params.outdir}/pipeline_info" - custom_config_version = 'master' - custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" - hostnames = false - config_profile_description = false - config_profile_contact = false - config_profile_url = false - - // Defaults only, expecting to be overwritten - max_memory = 128.GB - max_cpus = 16 - max_time = 240.h + + // Input options + input = "fake_input_to_build_refs.csv" + build_references = false + cosmic_username = null + cosmic_passwd = null + + // MultiQC options + multiqc_config = null + multiqc_title = null + max_multiqc_email_size = '25.MB' + + // Genome + genome = 'GRCh38' + genomes_base = "${params.outdir}/references" + ensembl_version = 102 + read_length = 100 + genomes = [:] + starfusion_build = true + + // Filtering + fusioninspector_filter = false + + + // Alignment options + star_ignore_sjdbgtf = false + seq_center = false + seq_platform = false + + // Enable or disable tools + all = false + arriba = false + fusioncatcher = false + pizzly = false + squid = false + starindex = false + starfusion = false + fusionreport = false + + // Skip steps + skip_qc = false + skip_vis = false + + // Path to references + ensembl_ref = "${params.genomes_base}/ensembl" + arriba_ref = "${params.genomes_base}/arriba" + arriba_ref_blacklist = "${params.genomes_base}/arriba/blacklist_hg38_GRCh38_v2.1.0.tsv.gz" + arriba_ref_protein_domain = "${params.genomes_base}/arriba/protein_domains_hg38_GRCh38_v2.1.0.gff3" + fusioncatcher_ref = "${params.genomes_base}/fusioncatcher/human_v102" + pizzly_ref = "${params.genomes_base}/pizzly/kallisto" + squid_ref = "${params.genomes_base}/squid" + starfusion_ref = "${params.genomes_base}/starfusion/ctat_genome_lib_build_dir" + starindex_ref = "${params.genomes_base}/star" + fusionreport_ref = "${params.genomes_base}/fusion_report_db" + + + // Path to fusion outputs + arriba_fusions = null + pizzly_fusions = null + squid_fusions = null + starfusion_fusions = null + fusioncatcher_fusions = null + + // Boilerplate options + outdir = null + tracedir = "${params.outdir}/pipeline_info" + publish_dir_mode = 'copy' + email = null + email_on_fail = null + plaintext_email = false + monochrome_logs = false + help = false + validate_params = true + show_hidden_params = false + schema_ignore_params = 'genomes' + enable_conda = false + singularity_pull_docker_container = false + + // Config options + custom_config_version = 'master' + custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" + config_profile_description = null + config_profile_contact = null + config_profile_url = null + config_profile_name = null + + // Max resource options + // Defaults only, expecting to be overwritten + max_memory = '128.GB' + max_cpus = 16 + max_time = '240.h' } -// Container slug. Stable releases should specify release tag! -// Developmental code should specify :dev -process.container = 'nfcore/rnafusion:1.2.0' +// Load base.config by default for all pipelines +includeConfig 'conf/base.config' -// Load nf-core custom profiles from different Institutions -try { - includeConfig "${params.custom_config_base}/nfcore_custom.config" -} catch (Exception e) { - System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") -} +includeConfig 'conf/genomes.config' -// Load nf-core/rnafusion custom profiles from different Institutions +// Load nf-core custom profiles from different Institutions try { - includeConfig "${params.custom_config_base}/pipeline/rnafusion.config" + includeConfig "${params.custom_config_base}/nfcore_custom.config" } catch (Exception e) { - System.err.println("WARNING: Could not load nf-core/config/rnafusion profiles: ${params.custom_config_base}/pipeline/rnafusion.config") + System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") } -// Load base.config by default for all pipelines -includeConfig 'conf/base.config' - -// Load genomes.config -includeConfig 'conf/genomes.config' +// Load nf-core/rnafusion custom profiles from different institutions. +// Warning: Uncomment only if a pipeline-specific instititutional config already exists on nf-core/configs! +// try { +// includeConfig "${params.custom_config_base}/pipeline/rnafusion.config" +// } catch (Exception e) { +// System.err.println("WARNING: Could not load nf-core/config/rnafusion profiles: ${params.custom_config_base}/pipeline/rnafusion.config") +// } profiles { - conda { process.conda = "$baseDir/environment.yml" } - debug { process.beforeScript = 'echo $HOSTNAME' } - docker { - docker.enabled = true - // Avoid this error: - // WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. - // Testing this in nf-core after discussion here https://github.com/nf-core/tools/pull/351 - // once this is established and works well, nextflow might implement this behavior as new default. - docker.runOptions = '-u \$(id -u):\$(id -g)' - } - singularity { - singularity.enabled = true - singularity.autoMounts = true - } - test { includeConfig 'conf/test.config' } + debug { process.beforeScript = 'echo $HOSTNAME' } + conda { + params.enable_conda = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + docker { + docker.enabled = true + docker.userEmulation = true + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + singularity { + singularity.enabled = true + singularity.autoMounts = true + docker.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + podman { + podman.enabled = true + docker.enabled = false + singularity.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + shifter { + shifter.enabled = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + charliecloud.enabled = false + } + charliecloud { + charliecloud.enabled = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + } + test { includeConfig 'conf/test.config' } + test_full { includeConfig 'conf/test_full.config' } + test_full_build { includeConfig 'conf/test_full_build.config' } + } -// Export this variable to prevent local Python libraries from conflicting with those in the container +// Export these variables to prevent local Python/R libraries from conflicting with those in the container +// The JULIA depot path has been adjusted to a fixed path `/usr/local/share/julia` that needs to be used for packages in the container. +// See https://apeltzer.github.io/post/03-julia-lang-nextflow/ for details on that. Once we have a common agreement on where to keep Julia packages, this is adjustable. + env { - PYTHONNOUSERSITE = 1 + PYTHONNOUSERSITE = 1 + R_PROFILE_USER = "/.Rprofile" + R_ENVIRON_USER = "/.Renviron" + JULIA_DEPOT_PATH = "/usr/local/share/julia" } // Capture exit codes from upstream processes when piping process.shell = ['/bin/bash', '-euo', 'pipefail'] +def trace_timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss') timeline { - enabled = true - file = "${params.tracedir}/execution_timeline.html" + enabled = true + file = "${params.tracedir}/execution_timeline_${trace_timestamp}.html" } report { - enabled = true - file = "${params.tracedir}/execution_report.html" + enabled = true + file = "${params.tracedir}/execution_report_${trace_timestamp}.html" } trace { - enabled = true - file = "${params.tracedir}/execution_trace.txt" + enabled = true + file = "${params.tracedir}/execution_trace_${trace_timestamp}.txt" } dag { - enabled = true - file = "${params.tracedir}/pipeline_dag.svg" + enabled = true + file = "${params.tracedir}/pipeline_dag_${trace_timestamp}.html" } manifest { - name = 'nf-core/rnafusion' - author = 'Martin Proks' - homePage = 'https://github.com/nf-core/rnafusion' - description = 'Nextflow rnafusion analysis pipeline, part of the nf-core community.' - mainScript = 'main.nf' - nextflowVersion = '>=19.10.0' - version = '1.2.0' + name = 'nf-core/rnafusion' + author = 'Martin Proks, Annick Renevey' + homePage = 'https://github.com/nf-core/rnafusion' + description = 'Nextflow rnafusion analysis pipeline, part of the nf-core community.' + mainScript = 'main.nf' + nextflowVersion = '!>=21.10.3' + version = '2.0.0' } +// Load modules.config for DSL2 module specific options +includeConfig 'conf/modules.config' + // Function to ensure that resource requirements don't go beyond // a maximum limit def check_max(obj, type) { - if (type == 'memory') { - try { - if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) - return params.max_memory as nextflow.util.MemoryUnit - else - return obj - } catch (all) { - println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" - return obj + if (type == 'memory') { + try { + if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) + return params.max_memory as nextflow.util.MemoryUnit + else + return obj + } catch (all) { + println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" + return obj + } + } else if (type == 'time') { + try { + if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) + return params.max_time as nextflow.util.Duration + else + return obj + } catch (all) { + println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" + return obj + } + } else if (type == 'cpus') { + try { + return Math.min( obj, params.max_cpus as int ) + } catch (all) { + println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" + return obj + } } - } else if (type == 'time') { - try { - if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) - return params.max_time as nextflow.util.Duration - else - return obj - } catch (all) { - println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'cpus') { - try { - return Math.min( obj, params.max_cpus as int ) - } catch (all) { - println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" - return obj - } - } -} \ No newline at end of file +} diff --git a/nextflow_schema.json b/nextflow_schema.json new file mode 100644 index 00000000..86365c5b --- /dev/null +++ b/nextflow_schema.json @@ -0,0 +1,490 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema", + "$id": "https://raw.githubusercontent.com/nf-core/rnafusion/master/nextflow_schema.json", + "title": "nf-core/rnafusion pipeline parameters", + "description": "Nextflow rnafusion analysis pipeline, part of the nf-core community.", + "type": "object", + "definitions": { + "skip_steps": { + "title": "Skip steps", + "type": "object", + "description": "Skip analysis steps", + "default": "", + "properties": { + "skip_qc": { + "type": "boolean", + "description": "Skip QC steps" + }, + "skip_vis": { + "type": "boolean", + "description": "Skip visualisation steps" + } + }, + "fa_icon": "fas fa-fast-forward" + }, + "input_output_options": { + "title": "Input/output options", + "type": "object", + "fa_icon": "fas fa-terminal", + "description": "Define where the pipeline should find input data and save output data.", + "required": ["genomes_base", "outdir"], + "properties": { + "input": { + "type": "string", + "format": "file-path", + "mimetype": "text/csv", + "pattern": "^\\S+\\.csv$", + "schema": "assets/schema_input.json", + "description": "Path to comma-separated file containing information about the samples in the experiment.", + "help_text": "You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row. See [usage docs](https://nf-co.re/rnafusion/usage#samplesheet-input).", + "fa_icon": "fas fa-file-csv" + }, + "outdir": { + "type": "string", + "format": "directory-path", + "description": "The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.", + "fa_icon": "fas fa-folder-open" + }, + "email": { + "type": "string", + "description": "Email address for completion summary.", + "fa_icon": "fas fa-envelope", + "help_text": "Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.", + "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$" + }, + "multiqc_title": { + "type": "string", + "description": "MultiQC report title. Printed as page header, used for filename if not otherwise specified.", + "fa_icon": "fas fa-file-signature" + }, + "build_references": { + "type": "boolean", + "fa_icon": "far fa-file-code", + "description": "Specifies which analysis type for the pipeline - either build references or analyse data" + }, + "genomes_base": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "Path to reference folder" + }, + "ensembl_version": { + "type": "integer", + "fa_icon": "far fa-file-code", + "description": "ensembl version", + "default": 105 + }, + "starfusion_build": { + "type": "boolean", + "fa_icon": "far fa-file-code", + "description": "If set, starfusion references are built from scratch instead of downloaded (default)" + }, + "read_length": { + "type": "integer", + "fa_icon": "far fa-file-code", + "description": "Read length", + "default": 100 + }, + "all": { + "type": "boolean", + "fa_icon": "far fa-file-code", + "description": "Build or run all references/analyses" + }, + "arriba": { + "type": "boolean", + "fa_icon": "far fa-file-code", + "description": "Build or run arriba references/analyses" + }, + "arriba_ref": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "Path to arriba references" + }, + "arriba_ref_blacklist": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "Path to arriba reference blacklist" + }, + "arriba_ref_protein_domain": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "Path to arriba reference protein domain" + }, + "arriba_fusions": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "Path to arriba output" + }, + "ensembl_ref": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "Path to ensembl references" + }, + "fusioncatcher": { + "type": "boolean", + "fa_icon": "far fa-file-code", + "description": "Build or run fusioncatcher references/analyses" + }, + "fusioncatcher_fusions": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "Path to fusioncatcher output" + }, + "fusioncatcher_ref": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "Path to fusioncatcher references" + }, + "fusioninspector_filter": { + "type": "boolean", + "fa_icon": "far fa-file-code", + "description": "Feed filtered fusionreport fusions to fusioninspector" + }, + "fusionreport": { + "type": "boolean", + "fa_icon": "far fa-file-code", + "description": "Build fusionreport references" + }, + "fusionreport_ref": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "Path to fusionreport references" + }, + "pizzly": { + "type": "boolean", + "fa_icon": "far fa-file-code", + "description": "Build or run pizzly references/analyses" + }, + "pizzly_fusions": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "Path to pizzly output" + }, + "pizzly_ref": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "Path to pizzly references" + }, + "squid": { + "type": "boolean", + "fa_icon": "far fa-file-code", + "description": "Build or run squid references/analyses" + }, + "squid_fusions": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "Path to squid output" + }, + "squid_ref": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "Path to squid references" + }, + "starfusion": { + "type": "boolean", + "fa_icon": "far fa-file-code", + "description": "Build or run starfusion references/analyses" + }, + "starfusion_fusions": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "Path to starfusion output" + }, + "starfusion_ref": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "Path to starfusion references" + }, + "starindex": { + "type": "boolean", + "fa_icon": "far fa-file-code", + "description": "Build or run starindex references/analyses" + }, + "starindex_ref": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "Path to starindex references" + }, + "cosmic_username": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "COSMIC username" + }, + "cosmic_passwd": { + "type": "string", + "fa_icon": "far fa-file-code", + "description": "COSMIC password" + } + } + }, + "reference_genome_options": { + "title": "Reference genome options", + "type": "object", + "fa_icon": "fas fa-dna", + "description": "Reference genome related files and options required for the workflow.", + "properties": { + "genome": { + "type": "string", + "description": "Name of iGenomes reference.", + "fa_icon": "fas fa-book", + "help_text": "If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. `--genome GRCh38`. \n\nSee the [nf-core website docs](https://nf-co.re/usage/reference_genomes) for more details." + }, + "fasta": { + "type": "string", + "format": "file-path", + "mimetype": "text/plain", + "pattern": "^\\S+\\.fn?a(sta)?(\\.gz)?$", + "description": "Path to FASTA genome file.", + "help_text": "This parameter is *mandatory* if `--genome` is not specified. If you don't have a BWA index available this will be generated for you automatically. Combine with `--save_reference` to save BWA index for future runs.", + "fa_icon": "far fa-file-code" + }, + "gtf": { + "type": "string", + "format": "file-path", + "mimetype": "text/plain", + "pattern": "^\\S+\\.gtf?(\\.gz)?$", + "description": "Path to GTF genome file.", + "fa_icon": "far fa-file-code" + }, + "chrgtf": { + "type": "string", + "format": "file-path", + "mimetype": "text/plain", + "pattern": "^\\S+\\.gtf?(\\.gz)?$", + "description": "Path to GTF genome file.", + "fa_icon": "far fa-file-code" + }, + "transcript": { + "type": "string", + "format": "file-path", + "mimetype": "text/plain", + "pattern": "^\\S+\\.fn?a(sta)?(\\.gz)?$", + "description": "Path to GTF genome file.", + "fa_icon": "far fa-file-code" + }, + "refflat": { + "type": "string", + "format": "file-path", + "mimetype": "text/plain", + "pattern": "^\\S+\\.refflat?$", + "description": "Path to GTF genome file.", + "fa_icon": "far fa-file-code" + } + } + }, + "institutional_config_options": { + "title": "Institutional config options", + "type": "object", + "fa_icon": "fas fa-university", + "description": "Parameters used to describe centralised config profiles. These should not be edited.", + "help_text": "The centralised nf-core configuration profiles use a handful of pipeline parameters to describe themselves. This information is then printed to the Nextflow log when you run a pipeline. You should not need to change these values when you run a pipeline.", + "properties": { + "custom_config_version": { + "type": "string", + "description": "Git commit id for Institutional configs.", + "default": "master", + "hidden": true, + "fa_icon": "fas fa-users-cog" + }, + "custom_config_base": { + "type": "string", + "description": "Base directory for Institutional configs.", + "default": "https://raw.githubusercontent.com/nf-core/configs/master", + "hidden": true, + "help_text": "If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.", + "fa_icon": "fas fa-users-cog" + }, + "config_profile_name": { + "type": "string", + "description": "Institutional config name.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + }, + "config_profile_description": { + "type": "string", + "description": "Institutional config description.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + }, + "config_profile_contact": { + "type": "string", + "description": "Institutional config contact information.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + }, + "config_profile_url": { + "type": "string", + "description": "Institutional config URL link.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + } + } + }, + "max_job_request_options": { + "title": "Max job request options", + "type": "object", + "fa_icon": "fab fa-acquisitions-incorporated", + "description": "Set the top limit for requested resources for any single job.", + "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.", + "properties": { + "max_cpus": { + "type": "integer", + "description": "Maximum number of CPUs that can be requested for any single job.", + "default": 16, + "fa_icon": "fas fa-microchip", + "hidden": true, + "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" + }, + "max_memory": { + "type": "string", + "description": "Maximum amount of memory that can be requested for any single job.", + "default": "128.GB", + "fa_icon": "fas fa-memory", + "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", + "hidden": true, + "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" + }, + "max_time": { + "type": "string", + "description": "Maximum amount of time that can be requested for any single job.", + "default": "240.h", + "fa_icon": "far fa-clock", + "pattern": "^(\\d+\\.?\\s*(s|m|h|day)\\s*)+$", + "hidden": true, + "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" + } + } + }, + "generic_options": { + "title": "Generic options", + "type": "object", + "fa_icon": "fas fa-file-import", + "description": "Less common options for the pipeline, typically set in a config file.", + "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.", + "properties": { + "help": { + "type": "boolean", + "description": "Display help text.", + "fa_icon": "fas fa-question-circle", + "hidden": true + }, + "publish_dir_mode": { + "type": "string", + "default": "copy", + "description": "Method used to save pipeline results to output directory.", + "help_text": "The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.", + "fa_icon": "fas fa-copy", + "enum": ["symlink", "rellink", "link", "copy", "copyNoFollow", "move"], + "hidden": true + }, + "email_on_fail": { + "type": "string", + "description": "Email address for completion summary, only when pipeline fails.", + "fa_icon": "fas fa-exclamation-triangle", + "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$", + "help_text": "An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.", + "hidden": true + }, + "plaintext_email": { + "type": "boolean", + "description": "Send plain-text email instead of HTML.", + "fa_icon": "fas fa-remove-format", + "hidden": true + }, + "max_multiqc_email_size": { + "type": "string", + "description": "File size limit when attaching MultiQC reports to summary emails.", + "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", + "default": "25.MB", + "fa_icon": "fas fa-file-upload", + "hidden": true + }, + "monochrome_logs": { + "type": "boolean", + "description": "Do not use coloured log outputs.", + "fa_icon": "fas fa-palette", + "hidden": true + }, + "multiqc_config": { + "type": "string", + "description": "Custom config file to supply to MultiQC.", + "fa_icon": "fas fa-cog", + "hidden": true + }, + "tracedir": { + "type": "string", + "description": "Directory to keep pipeline Nextflow logs and reports.", + "default": "${params.outdir}/pipeline_info", + "fa_icon": "fas fa-cogs", + "hidden": true + }, + "validate_params": { + "type": "boolean", + "description": "Boolean whether to validate parameters against the schema at runtime", + "default": true, + "fa_icon": "fas fa-check-square", + "hidden": true + }, + "show_hidden_params": { + "type": "boolean", + "fa_icon": "far fa-eye-slash", + "description": "Show all params when using `--help`", + "hidden": true, + "help_text": "By default, parameters set as _hidden_ in the schema are not shown on the command line when a user runs with `--help`. Specifying this option will tell the pipeline to show all parameters." + }, + "enable_conda": { + "type": "boolean", + "description": "Run this workflow with Conda. You can also use '-profile conda' instead of providing this parameter.", + "hidden": true, + "fa_icon": "fas fa-bacon" + }, + "singularity_pull_docker_container": { + "type": "boolean", + "description": "Use to pull docker containers to run with singularity", + "hidden": true, + "fa_icon": "fas fa-bacon" + }, + "seq_center": { + "type": "boolean", + "description": "Sequencing center", + "hidden": true, + "fa_icon": "fas fa-toolbox", + "help_text": "This will reported in the BAM header as CN" + }, + "seq_platform": { + "type": "boolean", + "description": "Sequencing platform", + "hidden": true, + "fa_icon": "fas fa-toolbox", + "help_text": "This will reported in the BAM header as PL." + }, + "star_ignore_sjdbgtf": { + "type": "boolean", + "description": "Whether to ignore the GTF in STAR alignment", + "hidden": true, + "fa_icon": "fas fa-toolbox", + "help_text": "Setting false will use GTF file for STAR alignment" + } + } + } + }, + "allOf": [ + { + "$ref": "#/definitions/skip_steps" + }, + { + "$ref": "#/definitions/input_output_options" + }, + { + "$ref": "#/definitions/reference_genome_options" + }, + { + "$ref": "#/definitions/institutional_config_options" + }, + { + "$ref": "#/definitions/max_job_request_options" + }, + { + "$ref": "#/definitions/generic_options" + } + ] +} diff --git a/scripts/build.sh b/scripts/build.sh deleted file mode 100755 index 5c439c43..00000000 --- a/scripts/build.sh +++ /dev/null @@ -1,51 +0,0 @@ -#!/bin/bash - -PREFIX="nfcore/rnafusion" - -create_container() { - TOOL_PATH=$1 - VERSION="$(cat $TOOL_PATH/environment.yml | grep "name:" | cut -d":" -f2 | cut -d "_" -f2)" - TOOL_NAME=`basename $TOOL_PATH` - CONTAINER_NAME="${PREFIX}:${TOOL_NAME}_${VERSION}" - echo "Building [$CONTAINER_NAME]" - docker build $TOOL_PATH -t $CONTAINER_NAME - docker push $CONTAINER_NAME -} - -if [ $# -eq 0 ]; then - echo "No tool name specified!" - echo "Run scripts/build.sh -h for help" - exit 1 -fi - -if [ $1 == "-h" ]; then - echo "Utility for building docker containers from tools/" - echo "Usage: scripts/build.sh [options]" - echo - echo "Options:" - echo " all build all tools including main image" - echo " builds specific tool" - echo " Example: sh scripts/build.sh ericscript" - exit 0 -fi - -if [ $1 == "all" ]; then - for TOOL in containers/*/; do - create_container `pwd`/$TOOL ${TOOL%?} - done - # Build main container - VERSION="$(cat nextflow.config | grep -m1 "container" | cut -d":" -f2 | cut -d "'" -f1)" - CONTAINER_NAME=$PREFIX:$VERSION - echo "Building [$CONTAINER_NAME]" - docker build . -t $CONTAINER_NAME - docker push $CONTAINER_NAME -else - TOOL=$1 - TOOL_PATH="$(pwd)/containers/$TOOL" - if [ ! -d $TOOL_PATH ]; then - echo "The tool doesn't exist" - exit 1 - else - create_container $TOOL_PATH - fi -fi diff --git a/subworkflows/local/arriba_workflow.nf b/subworkflows/local/arriba_workflow.nf new file mode 100644 index 00000000..981c9b33 --- /dev/null +++ b/subworkflows/local/arriba_workflow.nf @@ -0,0 +1,65 @@ +include { ARRIBA } from '../../modules/nf-core/modules/arriba/main' +include { ARRIBA_VISUALISATION } from '../../modules/local/arriba/visualisation/main' +include { GET_META } from '../../modules/local/getmeta/main' +include { GET_PATH as GET_PATH_ARRIBA_FAIL } from '../../modules/local/getpath/main' +include { SAMTOOLS_SORT as SAMTOOLS_SORT_FOR_ARRIBA } from '../../modules/nf-core/modules/samtools/sort/main' +include { SAMTOOLS_INDEX as SAMTOOLS_INDEX_FOR_ARRIBA} from '../../modules/nf-core/modules/samtools/index/main' +include { STAR_ALIGN as STAR_FOR_ARRIBA } from '../../modules/nf-core/modules/star/align/main' + + +workflow ARRIBA_WORKFLOW { + take: + reads + ch_gtf + ch_fasta + ch_starindex_ref + + main: + ch_versions = Channel.empty() + ch_dummy_file = file("$baseDir/assets/dummy_file_arriba.txt", checkIfExists: true) + + if (params.arriba || params.all) { + + STAR_FOR_ARRIBA( reads, ch_starindex_ref, ch_gtf, params.star_ignore_sjdbgtf, params.seq_platform, params.seq_center ) + ch_versions = ch_versions.mix(STAR_FOR_ARRIBA.out.versions) + + SAMTOOLS_SORT_FOR_ARRIBA(STAR_FOR_ARRIBA.out.bam) + ch_versions = ch_versions.mix(SAMTOOLS_SORT_FOR_ARRIBA.out.versions) + + SAMTOOLS_INDEX_FOR_ARRIBA(SAMTOOLS_SORT_FOR_ARRIBA.out.bam) + ch_versions = ch_versions.mix(SAMTOOLS_INDEX_FOR_ARRIBA.out.versions) + + bam_indexed = SAMTOOLS_SORT_FOR_ARRIBA.out.bam.join(SAMTOOLS_INDEX_FOR_ARRIBA.out.bai) + + if (params.arriba_fusions) { + ch_arriba_fusions = GET_META(reads, params.arriba_fusions) + ch_arriba_fusion_fail = ch_dummy_file + } else { + ARRIBA ( STAR_FOR_ARRIBA.out.bam, ch_fasta, ch_gtf, params.arriba_ref_blacklist, [], [], [], params.arriba_ref_protein_domain ) + ch_versions = ch_versions.mix(ARRIBA.out.versions) + + ch_arriba_fusions = ARRIBA.out.fusions + + GET_PATH_ARRIBA_FAIL(ARRIBA.out.fusions_fail) + ch_arriba_fusion_fail = GET_PATH_ARRIBA_FAIL.out.file + } + + ARRIBA_VISUALISATION(bam_indexed, ch_arriba_fusions, params.arriba_ref, ch_gtf) + ch_versions = ch_versions.mix(ARRIBA_VISUALISATION.out.versions) + + ch_arriba_visualisation = ARRIBA_VISUALISATION.out.pdf + + } + else { + ch_arriba_fusions = GET_META(reads, ch_dummy_file) + ch_arriba_fusion_fail = ch_dummy_file + ch_arriba_visualisation = ch_dummy_file + } + + emit: + fusions = ch_arriba_fusions + fusions_fail = ch_arriba_fusion_fail + versions = ch_versions.ifEmpty(null) + pdf = ch_arriba_visualisation + } + diff --git a/subworkflows/local/fusioncatcher_workflow.nf b/subworkflows/local/fusioncatcher_workflow.nf new file mode 100644 index 00000000..bbf771ab --- /dev/null +++ b/subworkflows/local/fusioncatcher_workflow.nf @@ -0,0 +1,32 @@ +include { FUSIONCATCHER } from '../../modules/local/fusioncatcher/detect/main' +include { GET_META } from '../../modules/local/getmeta/main' + + +workflow FUSIONCATCHER_WORKFLOW { + take: + reads + + main: + ch_versions = Channel.empty() + ch_dummy_file = file("$baseDir/assets/dummy_file_fusioncatcher.txt", checkIfExists: true) + + if (params.fusioncatcher || params.all) { + if (params.fusioncatcher_fusions){ + ch_fusioncatcher_fusions = GET_META(reads, params.fusioncatcher_fusions) + } else { + FUSIONCATCHER ( + reads, + params.fusioncatcher_ref + ) + ch_fusioncatcher_fusions = FUSIONCATCHER.out.fusions + } + } + else { + ch_fusioncatcher_fusions = GET_META(reads, ch_dummy_file) + } + + emit: + fusions = ch_fusioncatcher_fusions + versions = ch_versions.ifEmpty(null) + } + diff --git a/subworkflows/local/fusioninspector_workflow.nf b/subworkflows/local/fusioninspector_workflow.nf new file mode 100644 index 00000000..4ac83bdd --- /dev/null +++ b/subworkflows/local/fusioninspector_workflow.nf @@ -0,0 +1,21 @@ +include { FUSIONINSPECTOR } from '../../modules/local/fusioninspector/main' + + +workflow FUSIONINSPECTOR_WORKFLOW { + take: + reads + fusion_list + fusion_list_filtered + + main: + ch_versions = Channel.empty() + index ="${params.starfusion_ref}" + ch_fusion_list = params.fusioninspector_filter ? fusion_list_filtered : fusion_list + + FUSIONINSPECTOR( reads, ch_fusion_list , index ) + ch_versions = ch_versions.mix(FUSIONINSPECTOR.out.versions) + + emit: + versions = ch_versions.ifEmpty(null) +} + diff --git a/subworkflows/local/fusionreport_workflow.nf b/subworkflows/local/fusionreport_workflow.nf new file mode 100644 index 00000000..82f7ba1f --- /dev/null +++ b/subworkflows/local/fusionreport_workflow.nf @@ -0,0 +1,34 @@ +include { FUSIONREPORT } from '../../modules/local/fusionreport/detect/main' + + +workflow FUSIONREPORT_WORKFLOW { + take: + reads + fusionreport_ref + arriba_fusions + pizzly_fusions + squid_fusions + starfusion_fusions + fusioncatcher_fusions + + main: + ch_versions = Channel.empty() + + reads_fusions = reads + .join(arriba_fusions, remainder: true) + .join(pizzly_fusions, remainder: true) + .join(squid_fusions, remainder: true) + .join(starfusion_fusions, remainder: true) + .join(fusioncatcher_fusions, remainder: true) + + FUSIONREPORT( reads_fusions, fusionreport_ref) + ch_versions = ch_versions.mix(FUSIONREPORT.out.versions) + + emit: + versions = ch_versions.ifEmpty(null) + fusion_list = FUSIONREPORT.out.fusion_list + fusion_list_filtered = FUSIONREPORT.out.fusion_list_filtered + + +} + diff --git a/subworkflows/local/input_check.nf b/subworkflows/local/input_check.nf new file mode 100644 index 00000000..b0155b6e --- /dev/null +++ b/subworkflows/local/input_check.nf @@ -0,0 +1,46 @@ +// +// Check input samplesheet and get read channels +// + +include { SAMPLESHEET_CHECK } from '../../modules/local/samplesheet_check' + +workflow INPUT_CHECK { + take: + samplesheet // file: /path/to/samplesheet.csv + + main: + SAMPLESHEET_CHECK ( samplesheet ) + .csv + .splitCsv ( header:true, sep:',' ) + .map { create_fastq_channel(it) } + .set { reads } + + emit: + reads // channel: [ val(meta), [ reads ] ] + versions = SAMPLESHEET_CHECK.out.versions // channel: [ versions.yml ] +} + +// Function to get list of [ meta, [ fastq_1, fastq_2 ] ] +def create_fastq_channel(LinkedHashMap row) { + // create meta map + def meta = [:] + meta.id = row.sample + meta.single_end = row.single_end.toBoolean() + meta.strandedness = row.strandedness + + + // add path(s) of the fastq file(s) to the meta map + def fastq_meta = [] + if (!file(row.fastq_1).exists()) { + exit 1, "ERROR: Please check input samplesheet -> Read 1 FastQ file does not exist!\n${row.fastq_1}" + } + if (meta.single_end) { + fastq_meta = [ meta, [ file(row.fastq_1) ] ] + } else { + if (!file(row.fastq_2).exists()) { + exit 1, "ERROR: Please check input samplesheet -> Read 2 FastQ file does not exist!\n${row.fastq_2}" + } + fastq_meta = [ meta, [ file(row.fastq_1), file(row.fastq_2) ] ] + } + return fastq_meta +} diff --git a/subworkflows/local/pizzly_workflow.nf b/subworkflows/local/pizzly_workflow.nf new file mode 100644 index 00000000..1c73697f --- /dev/null +++ b/subworkflows/local/pizzly_workflow.nf @@ -0,0 +1,37 @@ +include { KALLISTO_QUANT } from '../../modules/local/kallisto/quant/main' +include { PIZZLY } from '../../modules/local/pizzly/detect/main' +include { GET_META } from '../../modules/local/getmeta/main' + +workflow PIZZLY_WORKFLOW { + take: + reads + ch_gtf + ch_transcript + + main: + ch_versions = Channel.empty() + ch_dummy_file = file("$baseDir/assets/dummy_file_pizzly.txt", checkIfExists: true) + + if (params.pizzly || params.all) { + if (params.pizzly_fusions) { + ch_pizzly_fusions = GET_META(reads, params.pizzly_fusions) + } else { + KALLISTO_QUANT(reads, params.pizzly_ref ) + ch_versions = ch_versions.mix(KALLISTO_QUANT.out.versions) + + PIZZLY( KALLISTO_QUANT.out.txt, ch_transcript, ch_gtf ) + ch_versions = ch_versions.mix(PIZZLY.out.versions) + + ch_pizzly_fusions = PIZZLY.out.fusions + } + } + else { + ch_pizzly_fusions = GET_META(reads, ch_dummy_file) + + } + + emit: + fusions = ch_pizzly_fusions + versions = ch_versions.ifEmpty(null) + } + diff --git a/subworkflows/local/qc_workflow.nf b/subworkflows/local/qc_workflow.nf new file mode 100644 index 00000000..1258e503 --- /dev/null +++ b/subworkflows/local/qc_workflow.nf @@ -0,0 +1,42 @@ +// +// Check input samplesheet and get read channels +// + +include { QUALIMAP_RNASEQ } from '../../modules/nf-core/modules/qualimap/rnaseq/main' +include { SAMTOOLS_INDEX as SAMTOOLS_INDEX_FOR_QC } from '../../modules/nf-core/modules/samtools/index/main' +include { PICARD_COLLECTRNASEQMETRICS } from '../../modules/local/picard/collectrnaseqmetrics/main' +include { PICARD_MARKDUPLICATES } from '../../modules/nf-core/modules/picard/markduplicates/main' + +workflow QC_WORKFLOW { + take: + bam_sorted + ch_chrgtf + ch_refflat + + main: + ch_versions = Channel.empty() + ch_qualimap_qc = Channel.empty() + + QUALIMAP_RNASEQ(bam_sorted, ch_chrgtf) + ch_versions = ch_versions.mix(QUALIMAP_RNASEQ.out.versions) + ch_qualimap_qc = QUALIMAP_RNASEQ.out.results.ifEmpty(null) + + SAMTOOLS_INDEX_FOR_QC(bam_sorted) + ch_versions = ch_versions.mix(SAMTOOLS_INDEX_FOR_QC.out.versions) + + bam_indexed = bam_sorted.join(SAMTOOLS_INDEX_FOR_QC.out.bai) + + PICARD_COLLECTRNASEQMETRICS(bam_indexed, ch_refflat, []) + ch_versions = ch_versions.mix(PICARD_COLLECTRNASEQMETRICS.out.versions) + + PICARD_MARKDUPLICATES(bam_sorted) + ch_versions = ch_versions.mix(PICARD_MARKDUPLICATES.out.versions) + + emit: + versions = ch_versions.ifEmpty(null) + qualimap_qc = ch_qualimap_qc.ifEmpty(null) + rnaseq_metrics = PICARD_COLLECTRNASEQMETRICS.out.metrics + duplicate_metrics = PICARD_MARKDUPLICATES.out.metrics + +} + diff --git a/subworkflows/local/squid_workflow.nf b/subworkflows/local/squid_workflow.nf new file mode 100644 index 00000000..5ed00d8a --- /dev/null +++ b/subworkflows/local/squid_workflow.nf @@ -0,0 +1,58 @@ +include { GET_META } from '../../modules/local/getmeta/main' +include { SAMTOOLS_INDEX as SAMTOOLS_INDEX_FOR_SQUID} from '../../modules/nf-core/modules/samtools/index/main' +include { SAMTOOLS_SORT as SAMTOOLS_SORT_FOR_SQUID } from '../../modules/nf-core/modules/samtools/sort/main' +include { SAMTOOLS_VIEW as SAMTOOLS_VIEW_FOR_SQUID } from '../../modules/nf-core/modules/samtools/view/main' +include { SQUID } from '../../modules/local/squid/detect/main' +include { SQUID_ANNOTATE } from '../../modules/local/squid/annotate/main' +include { STAR_ALIGN as STAR_FOR_SQUID } from '../../modules/nf-core/modules/star/align/main' + +workflow SQUID_WORKFLOW { + + take: + reads + ch_gtf + ch_starindex_ensembl_ref + + main: + ch_versions = Channel.empty() + ch_dummy_file = file("$baseDir/assets/dummy_file_squid.txt", checkIfExists: true) + + if (params.squid || params.all) { + if (params.squid_fusions){ + ch_squid_fusions = GET_META(reads, params.squid_fusions) + } else { + + STAR_FOR_SQUID( reads, ch_starindex_ensembl_ref, ch_gtf, params.star_ignore_sjdbgtf, params.seq_platform, params.seq_center ) + ch_versions = ch_versions.mix(STAR_FOR_SQUID.out.versions ) + + STAR_FOR_SQUID.out.sam + .map { meta, sam -> + return [meta, sam, []] + }.set { sam_indexed } + + SAMTOOLS_VIEW_FOR_SQUID ( sam_indexed, [] ) + ch_versions = ch_versions.mix(SAMTOOLS_VIEW_FOR_SQUID.out.versions ) + + SAMTOOLS_SORT_FOR_SQUID ( SAMTOOLS_VIEW_FOR_SQUID.out.bam ) + ch_versions = ch_versions.mix(SAMTOOLS_SORT_FOR_SQUID.out.versions ) + + bam_sorted = STAR_FOR_SQUID.out.bam_sorted.join(SAMTOOLS_SORT_FOR_SQUID.out.bam ) + + SQUID ( bam_sorted ) + ch_versions = ch_versions.mix(SQUID.out.versions) + + SQUID_ANNOTATE ( SQUID.out.fusions, ch_gtf ) + ch_versions = ch_versions.mix(SQUID_ANNOTATE.out.versions) + + ch_squid_fusions = SQUID_ANNOTATE.out.fusions_annotated + } + } + else { + ch_squid_fusions = GET_META(reads, ch_dummy_file) + } + + emit: + fusions = ch_squid_fusions + versions = ch_versions.ifEmpty(null) + } + diff --git a/subworkflows/local/starfusion_workflow.nf b/subworkflows/local/starfusion_workflow.nf new file mode 100644 index 00000000..5bf606aa --- /dev/null +++ b/subworkflows/local/starfusion_workflow.nf @@ -0,0 +1,41 @@ +include { STAR_ALIGN as STAR_FOR_STARFUSION } from '../../modules/nf-core/modules/star/align/main' +include { STARFUSION } from '../../modules/local/starfusion/detect/main' +include { GET_META } from '../../modules/local/getmeta/main' + +workflow STARFUSION_WORKFLOW { + take: + reads + ch_chrgtf + ch_starindex_ref + + main: + ch_versions = Channel.empty() + ch_align = Channel.empty() + ch_dummy_file = file("$baseDir/assets/dummy_file_starfusion.txt", checkIfExists: true) + + if (params.starfusion || params.all){ + if (params.starfusion_fusions){ + ch_starfusion_fusions = GET_META(reads, params.starfusion_fusions) + } else { + STAR_FOR_STARFUSION( reads, ch_starindex_ref, ch_chrgtf, params.star_ignore_sjdbgtf, params.seq_platform, params.seq_center ) + ch_versions = ch_versions.mix(STAR_FOR_STARFUSION.out.versions) + ch_align = STAR_FOR_STARFUSION.out.bam_sorted + + reads_junction = reads.join(STAR_FOR_STARFUSION.out.junction ) + + STARFUSION( reads_junction, params.starfusion_ref) + ch_versions = ch_versions.mix(STARFUSION.out.versions) + + ch_starfusion_fusions = STARFUSION.out.fusions + } + } + else { + ch_starfusion_fusions = GET_META(reads, ch_dummy_file) + } + emit: + fusions = ch_starfusion_fusions + bam_sorted = ch_align + versions = ch_versions.ifEmpty(null) + + } + diff --git a/workflows/build_references.nf b/workflows/build_references.nf new file mode 100644 index 00000000..c23466bd --- /dev/null +++ b/workflows/build_references.nf @@ -0,0 +1,87 @@ +/* +======================================================================================== + IMPORT LOCAL MODULES/SUBWORKFLOWS +======================================================================================== +*/ + +include { ENSEMBL_DOWNLOAD } from '../modules/local/ensembl/main' +include { ARRIBA_DOWNLOAD } from '../modules/local/arriba/download/main' +include { FUSIONCATCHER_DOWNLOAD } from '../modules/local/fusioncatcher/download/main' +include { FUSIONREPORT_DOWNLOAD } from '../modules/local/fusionreport/download/main' +include { STARFUSION_BUILD } from '../modules/local/starfusion/build/main' +include { STARFUSION_DOWNLOAD } from '../modules/local/starfusion/download/main' +include { GTF_TO_REFFLAT } from '../modules/local/uscs/custom_gtftogenepred/main' + +/* +======================================================================================== + IMPORT NF-CORE MODULES/SUBWORKFLOWS +======================================================================================== +*/ + +include { STAR_GENOMEGENERATE } from '../modules/nf-core/modules/star/genomegenerate/main' +include { KALLISTO_INDEX as PIZZLY_INDEX } from '../modules/nf-core/modules/kallisto/index/main' + +/* +======================================================================================== + RUN MAIN WORKFLOW +======================================================================================== +*/ + +workflow BUILD_REFERENCES { + + ENSEMBL_DOWNLOAD( params.ensembl_version ) + + if (params.starindex || params.all || params.starfusion || params.arriba || params.squid ) { + STAR_GENOMEGENERATE( ENSEMBL_DOWNLOAD.out.fasta, ENSEMBL_DOWNLOAD.out.gtf ) + } + + if (params.arriba || params.all) { + ARRIBA_DOWNLOAD() + } + + if (params.fusioncatcher || params.all) { + FUSIONCATCHER_DOWNLOAD() + } + + if (params.pizzly || params.all) { + PIZZLY_INDEX( ENSEMBL_DOWNLOAD.out.transcript ) + } + + if (params.starfusion || params.all) { + if (params.starfusion_build){ + STARFUSION_BUILD( ENSEMBL_DOWNLOAD.out.fasta, ENSEMBL_DOWNLOAD.out.chrgtf ) + } else { + STARFUSION_DOWNLOAD() + } + } + + if (params.starfusion_build){ + GTF_TO_REFFLAT(ENSEMBL_DOWNLOAD.out.chrgtf) + } else { + GTF_TO_REFFLAT(STARFUSION_DOWNLOAD.out.chrgtf) + } + + if (params.fusionreport || params.all) { + FUSIONREPORT_DOWNLOAD( params.cosmic_username, params.cosmic_passwd ) + } + +} + +/* +======================================================================================== + COMPLETION EMAIL AND SUMMARY +======================================================================================== +*/ + +workflow.onComplete { + if (params.email || params.email_on_fail) { + NfcoreTemplate.email(workflow, params, summary_params, projectDir, log, multiqc_report) + } + NfcoreTemplate.summary(workflow, params, log) +} + +/* +======================================================================================== + THE END +======================================================================================== +*/ diff --git a/workflows/rnafusion.nf b/workflows/rnafusion.nf new file mode 100644 index 00000000..9dac31ce --- /dev/null +++ b/workflows/rnafusion.nf @@ -0,0 +1,267 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + VALIDATE INPUTS +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +def summary_params = NfcoreSchema.paramsSummaryMap(workflow, params) + +// Validate input parameters +WorkflowRnafusion.initialise(params, log) + +// Check mandatory parameters + +if (file(params.input).exists() || params.build_references) { ch_input = file(params.input) } else { exit 1, 'Input samplesheet does not exist or was not specified!' } + + +ch_chrgtf = params.starfusion_build ? file(params.chrgtf) : file("${params.starfusion_ref}/ref_annot.gtf") +ch_starindex_ref = params.starfusion_build ? params.starindex_ref : "${params.starfusion_ref}/ref_genome.fa.star.idx" +ch_starindex_ensembl_ref = params.starindex_ref +ch_refflat = params.starfusion_build ? file(params.refflat) : "${params.ensembl_ref}/ref_annot.gtf.refflat" + +def checkPathParamList = [ + params.fasta, + params.gtf, + ch_chrgtf, + params.transcript, + ch_refflat +] + + +for (param in checkPathParamList) if ((param) && !params.build_references) file(param, checkIfExists: true) +if (params.fasta[0,1] == "s3") { + log.info "INFO: s3 path detected, check for absolute path and trailing '/' not performed" +} +else { + for (param in checkPathParamList) if ((param.toString())!= file(param).toString() && !params.build_references) { exit 1, "Problem with ${param}: ABSOLUTE PATHS are required! Check for trailing '/' at the end of paths too." } +} +if ((params.squid || params.all) && params.ensembl_version == 105) { exit 1, 'Ensembl version 105 is not supported by squid' } + +ch_fasta = file(params.fasta) +ch_gtf = file(params.gtf) +ch_transcript = file(params.transcript) + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CONFIG FILES +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +ch_multiqc_config = file("$projectDir/assets/multiqc_config.yml", checkIfExists: true) +ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config) : Channel.empty() + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + IMPORT LOCAL MODULES/SUBWORKFLOWS +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +// +// SUBWORKFLOW: Consisting of a mix of local and nf-core/modules +// + +include { INPUT_CHECK } from '../subworkflows/local/input_check' +include { ARRIBA_WORKFLOW } from '../subworkflows/local/arriba_workflow' +include { PIZZLY_WORKFLOW } from '../subworkflows/local/pizzly_workflow' +include { QC_WORKFLOW } from '../subworkflows/local/qc_workflow' +include { SQUID_WORKFLOW } from '../subworkflows/local/squid_workflow' +include { STARFUSION_WORKFLOW } from '../subworkflows/local/starfusion_workflow' +include { FUSIONCATCHER_WORKFLOW } from '../subworkflows/local/fusioncatcher_workflow' +include { FUSIONINSPECTOR_WORKFLOW } from '../subworkflows/local/fusioninspector_workflow' +include { FUSIONREPORT_WORKFLOW } from '../subworkflows/local/fusionreport_workflow' + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + IMPORT NF-CORE MODULES/SUBWORKFLOWS +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +// +// MODULE: Installed directly from nf-core/modules +// +include { CAT_FASTQ } from '../modules/nf-core/modules/cat/fastq/main' +include { FASTQC } from '../modules/nf-core/modules/fastqc/main' +include { MULTIQC } from '../modules/nf-core/modules/multiqc/main' +include { CUSTOM_DUMPSOFTWAREVERSIONS } from '../modules/nf-core/modules/custom/dumpsoftwareversions/main' + + + + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + RUN MAIN WORKFLOW +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +// Info required for completion email and summary +def multiqc_report = [] + +workflow RNAFUSION { + + ch_versions = Channel.empty() + + // + // SUBWORKFLOW: Read in samplesheet, validate and stage input files + // + INPUT_CHECK ( + ch_input + ) + .reads + .map { + meta, fastq -> + meta.id = meta.id.split('_')[0..-2].join('_') + [ meta, fastq ] } + .groupTuple(by: [0]) + .branch { + meta, fastq -> + single : fastq.size() == 1 + return [ meta, fastq.flatten() ] + multiple: fastq.size() > 1 + return [ meta, fastq.flatten() ] + } + .set { ch_fastq } + ch_versions = ch_versions.mix(INPUT_CHECK.out.versions) + + CAT_FASTQ ( + ch_fastq.multiple + ) + .reads + .mix(ch_fastq.single) + .set { ch_cat_fastq } + ch_versions = ch_versions.mix(CAT_FASTQ.out.versions.first().ifEmpty(null)) + + + // + // MODULE: Run FastQC + // + FASTQC ( + ch_cat_fastq + ) + ch_versions = ch_versions.mix(FASTQC.out.versions.first()) + + + + + // Run STAR alignment and Arriba + ARRIBA_WORKFLOW ( + ch_cat_fastq, + ch_gtf, + ch_fasta, + ch_starindex_ref + ) + ch_versions = ch_versions.mix(ARRIBA_WORKFLOW.out.versions.first().ifEmpty(null)) + + // Run pizzly/kallisto + + PIZZLY_WORKFLOW ( + ch_cat_fastq, + ch_gtf, + ch_transcript + ) + ch_versions = ch_versions.mix(PIZZLY_WORKFLOW.out.versions.first().ifEmpty(null)) + + +// Run squid + + SQUID_WORKFLOW ( + ch_cat_fastq, + ch_gtf, + ch_starindex_ensembl_ref + ) + ch_versions = ch_versions.mix(SQUID_WORKFLOW.out.versions.first().ifEmpty(null)) + + +//Run STAR fusion + STARFUSION_WORKFLOW ( + ch_cat_fastq, + ch_chrgtf, + ch_starindex_ref + ) + ch_versions = ch_versions.mix(STARFUSION_WORKFLOW.out.versions.first().ifEmpty(null)) + + +//Run fusioncatcher + FUSIONCATCHER_WORKFLOW ( + ch_cat_fastq + ) + ch_versions = ch_versions.mix(FUSIONCATCHER_WORKFLOW.out.versions.first().ifEmpty(null)) + + + //Run fusion-report + FUSIONREPORT_WORKFLOW ( + ch_cat_fastq, + params.fusionreport_ref, + ARRIBA_WORKFLOW.out.fusions, + PIZZLY_WORKFLOW.out.fusions, + SQUID_WORKFLOW.out.fusions, + STARFUSION_WORKFLOW.out.fusions, + FUSIONCATCHER_WORKFLOW.out.fusions + ) + ch_versions = ch_versions.mix(FUSIONREPORT_WORKFLOW.out.versions.first().ifEmpty(null)) + + + //Run fusionInpector + FUSIONINSPECTOR_WORKFLOW ( + ch_cat_fastq, + FUSIONREPORT_WORKFLOW.out.fusion_list, + FUSIONREPORT_WORKFLOW.out.fusion_list_filtered + ) + ch_versions = ch_versions.mix(FUSIONINSPECTOR_WORKFLOW.out.versions.first().ifEmpty(null)) + + + //QC + QC_WORKFLOW ( + STARFUSION_WORKFLOW.out.bam_sorted, + ch_chrgtf, + ch_refflat + ) + ch_versions = ch_versions.mix(QC_WORKFLOW.out.versions.first().ifEmpty(null)) + + CUSTOM_DUMPSOFTWAREVERSIONS ( + ch_versions.unique().collectFile(name: 'collated_versions.yml') + ) + + + // + // MODULE: MultiQC + // + workflow_summary = WorkflowRnafusion.paramsSummaryMultiqc(workflow, summary_params) + ch_workflow_summary = Channel.value(workflow_summary) + + ch_multiqc_files = Channel.empty() + ch_multiqc_files = ch_multiqc_files.mix(Channel.from(ch_multiqc_config)) + ch_multiqc_files = ch_multiqc_files.mix(ch_multiqc_custom_config.collect().ifEmpty([])) + ch_multiqc_files = ch_multiqc_files.mix(ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')) + ch_multiqc_files = ch_multiqc_files.mix(CUSTOM_DUMPSOFTWAREVERSIONS.out.mqc_yml.collect()) + ch_multiqc_files = ch_multiqc_files.mix(FASTQC.out.zip.collect{it[1]}.ifEmpty([])) + + MULTIQC ( + ch_multiqc_files.collect() + ) + + multiqc_report = MULTIQC.out.report.toList() + ch_versions = ch_versions.mix(MULTIQC.out.versions) + +} + + + + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + COMPLETION EMAIL AND SUMMARY +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +workflow.onComplete { + if (params.email || params.email_on_fail) { + NfcoreTemplate.email(workflow, params, summary_params, projectDir, log, multiqc_report) + } + NfcoreTemplate.summary(workflow, params, log) +} + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + THE END +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/