Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding fastq_screen module #19

Draft
wants to merge 3 commits into
base: dev
Choose a base branch
from
Draft

adding fastq_screen module #19

wants to merge 3 commits into from

Conversation

FranBonath
Copy link
Member

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/seqinspector branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nf-test test main.nf.test -profile test,docker).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

github-actions bot commented Aug 21, 2024

nf-core lint overall result: Failed ❌

Posted for pipeline commit 2e58a3c

+| ✅ 173 tests passed       |+
!| ❗  21 tests had warnings |!
-| ❌   2 tests failed       |-

❌ Test failures:

  • template_strings - Found a Jinja template string in /home/runner/work/seqinspector/seqinspector/modules/nf-core/fastqscreen/references/genome_ecoli/genome.rev.1.bt2 L20005: �{{©_Ö·ªªÿ�]Êû�XmU}}
  • merge_markers - Merge marker '<<<<<<<' in /home/runner/work/seqinspector/seqinspector/modules/nf-core/fastqscreen/references/genome_cerevisiae/genome.4.bt2: Ïó°Ïü0ÃÌÌ÷�üÌ0�������ÃÃ<ÏÀ0Ï�

❗ Test warnings:

  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file
  • pipeline_todos - TODO string in nextflow.config: Specify your pipeline's command line flags
  • pipeline_todos - TODO string in README.md: TODO nf-core:
  • pipeline_todos - TODO string in README.md: Include a figure that guides the user through the major workflow steps. Many nf-core
  • pipeline_todos - TODO string in README.md: Fill in short bullet-pointed list of the default steps in the pipeline
  • pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
  • pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • pipeline_todos - TODO string in ci.yml: You can customise CI pipeline run tests as required
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in test.config: Specify the paths to your test data on nf-core/test-datasets
  • pipeline_todos - TODO string in test.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.

✅ Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-09-18 09:20:34

@FranBonath
Copy link
Member Author

@nf-core-bot fix linting, please :)

@MatthiasZepper
Copy link
Member

I only had a quick glance (so no formal review yet), but I would prefer that we start using git lfs on this repo for managing the references and other large files. I am sure that we will have more modules that require large reference data:

git lfs install
git lfs track "*.bt2" 
git lfs track "*.fa" 
git add .gitattributes
git commit

Ideally, you would in the process also edit the history on your branch, because the previous commands would only apply to future files, but not the ones that you already committed. For this, it is necessary to rewrite the history and that will lead to diverging branches with your origin:

On branch dev
Your branch and 'origin/dev' have diverged,
and have 1 and 1 different commits each, respectively.
(use "git pull" if you want to integrate the remote branch with yours)

Instead of pulling in the origin, however, (which would bring back the data we want to prune from history), we force push the rewritten history to the remote origin on GitHub. Then your origin dev branch is clean again and can be safely merged to the upstream repo in nf-core after approval of this PR:

git checkout -b "backup_dev"
git checkout dev
git lfs install
pip install git-filter-repo
git lfs migrate import --include="*.bt2"  --include="*.fa" 
git lfs track "*.bt2" 
git lfs track "*.fa" 
git add .
git commit --amend
git reflog expire --expire=now --single-worktree
git gc --prune=now --aggressive
git push --force-with-lease

@alneberg
Copy link
Member

The suggestion by @MatthiasZepper seems to be a bit of work but I think I agree. Probably worth getting git lfs up and running right away.

@MatthiasZepper
Copy link
Member

The suggestion by @MatthiasZepper seems to be a bit of work but I think I agree. Probably worth getting git lfs up and running right away.

The downside of git lfs is, that it is not right away supported by Nextflow :-/ ... sounds like a good feature for a plugin but...alas.

@FranBonath
Copy link
Member Author

As per our discussion in the dev meeting, we bench the git lfs implementation for now, right? An I will instead pivot to use iGenome references for the tests, correct? @alneberg @MatthiasZepper

@alneberg
Copy link
Member

alneberg commented Sep 9, 2024

Yes, git lfs is not an option at the moment I think. If a suitable genome is already present in iGenomes that would be perfect I think.

@FranBonath
Copy link
Member Author

I tried to provide the references for fastq screen test profile via igenomes, which is in a S3 bucket. Problem is, fastqscreen cannot read it. We have a few options, none I really like:

  1. ) ship the pipeline with bowtie2build and make our own bowtie index. This is what the module test uses. I don't like it because we add a tool to our pipeline just so we get the tests to run. A tool that can break and that has no impact on actually running the pipeline for real
  2. ) what I am currently doing and providing the reference as part of the module. We can chose for it to be only one, very small, references, for example PhiX. I hate hard coded anything.
  3. ) don't have test :P

@FranBonath
Copy link
Member Author

FranBonath commented Sep 18, 2024

I tried to provide the references for fastq screen test profile via igenomes, which is in a S3 bucket. Problem is, fastqscreen cannot read it. We have a few options, none I really like:

  1. ) ship the pipeline with bowtie2build and make our own bowtie index. This is what the module test uses. I don't like it because we add a tool to our pipeline just so we get the tests to run. A tool that can break and that has no impact on actually running the pipeline for real
  2. ) what I am currently doing and providing the reference as part of the module. We can chose for it to be only one, very small, references, for example PhiX. I hate hard coded anything.
  3. ) don't have test :P

I had a lengthly discussion with @maxulysse about this, but we couldn't really agree.

@Aratz
Copy link
Collaborator

Aratz commented Sep 18, 2024

I'd say 2, if you choose a tiny reference.

I think in general bad tests are better than no tests, I don't think it's such a big issue that the reference is unrealistically small, the tests will still catch eg if fastq_screen crashes because of some config error.

@maxulysse
Copy link
Member

I agree with @Aratz, it's better to have bad tests than no tests.
I think option 2 works well for profile test, but for profile test_full we're going to need more than that, so why not going all the way already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants