Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it work without specifying GTF file #322

Merged
merged 3 commits into from
May 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 0 additions & 9 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,13 @@ nextflow.enable.dsl = 2
include { SCRNASEQ } from './workflows/scrnaseq'
include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_nfcore_scrnaseq_pipeline'
include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_scrnaseq_pipeline'
include { getGenomeAttribute } from './subworkflows/local/utils_nfcore_scrnaseq_pipeline'

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
GENOME PARAMETER VALUES
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/
// we cannot modify params. here, we must load the files
ch_genome_fasta = params.genome ? file( getGenomeAttribute('fasta'), checkIfExists: true ) : []
ch_gtf = params.genome ? file( getGenomeAttribute('gtf'), checkIfExists: true ) : []
Comment on lines -31 to -32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it required to be removed from the main.nf. Was it it the "main"-template functions that were complaining about not providing a fasta?

If that is the case, I would say this is a ticket that should be taken entirely at once, in a separate one, because it would affect all of the workflows.

Thus having a single issue to take a look at this matter for all workflows would be easier to track.

Issues that could be optionally merged together for it are: #313 and #277 .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #313 is already part of the cellranger multi PR.

I wasn't aware that there's a linting check for that, but I'd argue that as it is currently implemented is suboptimal. Either

  • all parameters should be evaluated outside the scrnaseq workflow and passed as arguments
  • all parameters should be evaluated within the workflow and not arguments should be passed.

Currently it is just inconsistent with fastq/gtf being passed as arguments and to me it's highly confusing that different parts of the workflow partly evaluate some parameters. It also doesn't help because you anyway can't include the workflow somewhere else and run it as it is currently implemented.

For the sake of simplicitly I'd vote for sticking with what we currently have (i.e. evaluate parameters within the workflow) and if necessary ignore the respective linting checks. The solution intended by the template authors would probably be to move all parameter checks outside the workflow and pass them as arguments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linting doesn't seem to fail because of this btw.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, every other subworkflow currently, have the input assertion happening inside the sub-workflow itself:

e.g. alevin

assert (genome_fasta && gtf && salmon_index && txp2gene) || (genome_fasta && gtf)  || (genome_fasta && gtf && transcript_fasta && txp2gene):
        """Must provide a genome fasta file ('--fasta') and a gtf file ('--gtf'), or a genome fasta file
        and a transcriptome fasta file ('--transcript_fasta`) if no index and txp2gene is given!""".stripIndent()

However, they are indeed seeming to rely on gtf/fasta that had been file() loaded outside. So, we should also bring the loaders to the inside the sub-workflow on the others as well.

Is that the correct understanding?

Copy link
Member Author

@grst grst May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think the subworkflows (for each aligner) should be fully abstracted (i.e. not rely on params anywhere, but just consume input channels/values via take).

It's just about whether to evaluate params in main.nf or in workflows/scrnaseq.nf.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah!,
Now I can see it clearly that the changes were happening in these two.
For some reason I was reading as if they were included inside the cellrangermulti sub-workflow.
🤯


/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -44,8 +41,6 @@ workflow NFCORE_SCRNASEQ {

take:
samplesheet // channel: samplesheet read in from --input
ch_genome_fasta
ch_gtf

main:

Expand All @@ -54,8 +49,6 @@ workflow NFCORE_SCRNASEQ {
//
SCRNASEQ (
samplesheet,
ch_genome_fasta,
ch_gtf
)

emit:
Expand Down Expand Up @@ -90,8 +83,6 @@ workflow {
//
NFCORE_SCRNASEQ (
PIPELINE_INITIALISATION.out.samplesheet,
ch_genome_fasta,
ch_gtf
)

//
Expand Down
2 changes: 1 addition & 1 deletion subworkflows/local/align_cellrangermulti.nf
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ workflow CELLRANGER_MULTI_ALIGN {
//
// Prepare GTF
//
if (!cellranger_gex_index || !cellranger_vdj_index) {
if ( !cellranger_gex_index || (!cellranger_vdj_index && !params.skip_cellrangermulti_vdjref) ) {

// Filter GTF based on gene biotypes passed in params.modules
CELLRANGER_MKGTF ( ch_gtf )
Expand Down
11 changes: 5 additions & 6 deletions workflows/scrnaseq.nf
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@ include { paramsSummaryMultiqc } from '../subworkflows/nf-core/uti
include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline'
include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_scrnaseq_pipeline'
include { paramsSummaryLog; paramsSummaryMap } from 'plugin/nf-validation'
include { getGenomeAttribute } from '../subworkflows/local/utils_nfcore_scrnaseq_pipeline'



workflow SCRNASEQ {

take:
ch_fastq
ch_genome_fasta
ch_gtf

main:

Expand All @@ -32,9 +32,8 @@ workflow SCRNASEQ {
error "Only cellranger supports `protocol = 'auto'`. Please specify the protocol manually!"
}

// overwrite fasta and gtf if user provide a custom one
ch_genome_fasta = Channel.value(params.fasta ? file(params.fasta) : ch_genome_fasta)
ch_gtf = Channel.value(params.gtf ? file(params.gtf) : ch_gtf)
ch_genome_fasta = params.fasta ? file(params.fasta, checkIfExists: true) : ( params.genome ? file( getGenomeAttribute('fasta'), checkIfExists: true ) : [] )
ch_gtf = params.gtf ? file(params.gtf, checkIfExists: true) : ( params.genome ? file( getGenomeAttribute('gtf'), checkIfExists: true ) : [] )

// general input and params
ch_transcript_fasta = params.transcript_fasta ? file(params.transcript_fasta): []
Expand Down Expand Up @@ -118,7 +117,7 @@ workflow SCRNASEQ {
}

// filter gtf
ch_filter_gtf = GTF_GENE_FILTER ( ch_genome_fasta, ch_gtf ).gtf
ch_filter_gtf = ch_gtf ? GTF_GENE_FILTER ( ch_genome_fasta, ch_gtf ).gtf : []

// Run kallisto bustools pipeline
if (params.aligner == "kallisto") {
Expand Down
Loading