Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have a samplesheet with an extra level of grouping #16

Open
maxulysse opened this issue May 30, 2024 · 2 comments
Open

Have a samplesheet with an extra level of grouping #16

maxulysse opened this issue May 30, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@maxulysse
Copy link
Member

Description of feature

I would love to be able to add an extra level for grouping:

  • group/project/cohort
    • individual/patient
      • sample (tumor/normal)

That way I would be able to have multiple samples from the same patients and differentiate them from samples from another patient within the same group/project/cohort.

@maxulysse maxulysse added the enhancement New feature or request label May 30, 2024
@Aratz
Copy link
Collaborator

Aratz commented May 30, 2024

That's an interesting idea, maybe the most flexible way to implement this would be to have some kind of tagging system?

More specifically, this could be an extra column where one could define any number of tags that can be used to group samples together in a specific report. It could look something like this:

sample,fastq_1,fastq_2,tags
SAMPLE1,/path/to/fastq/files/AEG588A4_S1_L003_R1_001.fastq.gz,,"lane_1,group_A,patient_1"
SAMPLE2,/path/to/fastq/files/AEG588A4_S2_L003_R1_001.fastq.gz,,"lane_1,group_A,patient_2"
SAMPLE3,/path/to/fastq/files/AEG588A4_S3_L003_R1_001.fastq.gz,,"lane_2,group_A,patient_1"
SAMPLE4,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz,,"lane_2,group_A,patient_2"
SAMPLE5,/path/to/fastq/files/AEG588A4_S5_L003_R1_001.fastq.gz,,"lane_2,group_B"

This would generate 7 reports:

  • one global report with all samples
  • two lane reports, (one with SAMPLE1 and SAMPLE2, the other with SAMPLE3 and SAMPLE4)
  • two group reports (one with SAMPLE1, SAMPLE2, SAMPLE3, SAMPLE4, and one with SAMPLE5)
  • two patient reports (one with SAMPLE1 and SAMPLE3, the other with SAMPLE2 and SAMPLE4)

This could replace columns lane, group and rundir since these are optional and are not defined for all applications.

I can imagine this pipeline being used for a wide range of applications, and it is probably unrealistic to hard code all possible ways to group samples together. I think we are already seeing the limits of that approach for instance with group, which is very important for us sequencing platforms, but maybe less for research teams, or with lane, which is specific to some sequencing instruments.

By using this tagging system we could handle basically any way to group samples.

@Aratz
Copy link
Collaborator

Aratz commented May 30, 2024

Although maybe rundir we need to keep 🤔 because that's a path we need to fetch information from files that come from the sequencer (e.g. InterOp files)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Progress
Development

No branches or pull requests

2 participants