Skip to content

Commit

Permalink
add generic assay plugin (#1946)
Browse files Browse the repository at this point in the history
  • Loading branch information
sellth authored Aug 8, 2024
1 parent a07a1e7 commit 8ebfd20
Show file tree
Hide file tree
Showing 4 changed files with 277 additions and 12 deletions.
1 change: 1 addition & 0 deletions config/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@
'samplesheets.studyapps.cancer.apps.CancerConfig',
# Samplesheets assay sub-apps
'samplesheets.assayapps.dna_sequencing.apps.DnaSequencingConfig',
'samplesheets.assayapps.generic.apps.GenericConfig',
'samplesheets.assayapps.generic_raw.apps.GenericRawConfig',
'samplesheets.assayapps.meta_ms.apps.MetaMsConfig',
'samplesheets.assayapps.microarray.apps.MicroarrayConfig',
Expand Down
71 changes: 59 additions & 12 deletions docs_manual/source/metadata_advanced.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
.. _metadata_advanced:
.. include:: <isonum.txt>

Advanced Metadata Topics
^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -57,12 +58,12 @@ under ``SHEETS_IGV_OMIT_BAM`` (also affects CRAM files) and
Assay iRODS Data Linking
========================

Similar to study data linking, SODAR also displays iRODS links specific to
assays according to an **assay plugin**. The selected plugin affects the
Similar to study data linking, SODAR also displays iRODS links in the assay
section according to an **assay plugin**. The selected plugin affects the
following types of iRODS links:

- **Assay shortcuts** card above each assay table
- **Row-specific links** in the right hand column of each row
- **Assay shortcuts** card above each assay table.
- **Row-specific links** in the right hand column of each row.
- **Inline links** which are file names stored in the table itself, under e.g.
"data file" materials.

Expand Down Expand Up @@ -102,24 +103,31 @@ to true, the assay plugin used for the assay should implement the
SODAR currently supports the following assay plugins:

- **DNA Sequencing**
- **Generic Assay Plugin**
- **Generic Raw Data Plugin**
- **Metabolite Profiling / Mass Spectrometry**
- **Microarray**
- **Protein Expression Profiling / Mass Cytometry**
- **Protein Expression Profiling / Mass Spectrometry**

Common links as well as plugin specific links are detailed below.
General Concepts
----------------

Common Links
------------

Links to the following iRODS collections are provided for *all* assay
Links to the following iRODS collections are provided for all assay
configurations in the assay shortcuts card:

- ``ResultsReports``: Collection for assay specific result and report files
- ``ResultsReports``: Collection for assay specific result and report files.
- ``MiscFiles``: Miscellaneous files
- ``TrackHubs``: Track hubs for UCSC Genome Browser integration (displayed if
track hubs have been created)
track hubs have been created).

Assay plugins can create the following additional links to connect samplesheet
metadata to files stored in iRODS:

1. Additional assay-wide collections and shortcuts (e. g. ``RawData``).
2. Creating row-specific collections and shortcuts (i. e. ``RowPath``).
3. Converting cell values within the Samplesheets table into iRODS/WebDAV
links (i. e. **inline links**).

DNA Sequencing Plugin
---------------------
Expand All @@ -130,6 +138,7 @@ DNA Sequencing Plugin
- Row-specific links
* Each row links to the **last material name** in the row, not counting
"data file" materials.
* Creates collections in Landing Zones according to this ``RowPath``.
- Inline links
* N/A
- Used with measurement type / technology type
Expand All @@ -139,6 +148,44 @@ DNA Sequencing Plugin
* transcriptome profiling / nucleotide sequencing
* panel sequencing / nucleotide sequencing

Generic Assay Plugin
--------------------

This plugin can be used with any assay i. e. measurement/technology configuration.
It enables the user to define row-specific and inline links to iRODS collections
via comments in the ``STUDY ASSAYS`` section of the ISA-Tab investigation file.

- Internal name: ``samplesheets_assay_generic``
- Row-specific links
* Place one or multiple comments starting with ``SODAR Assay Row Path``.
* Each comment should define one column name.
* The comments are evaluated in alphabetical order.
* Values within these columns are used to define the ``RowPath``.
* For example:
.. code-block::
STUDY ASSAYS
Comment[SODAR Assay Row Path 1] Pool ID
Comment[SODAR Assay Row Path 2] Extract Name
+ Resulting row links:
``/sodarZone/projects/xxx/sample_data/study_yyy/assay_zzz/<pool_id>/<extract_name>/``
- Inline links
* Comments define semicolon-separated lists of columns to be linked to collections.
* *SODAR Assay Link Results* |rarr| ``ResultsReports``
* *SODAR Assay Link MiscFiles* |rarr| ``MiscFiles``
* *SODAR Assay Link Row* |rarr| ``RowPath``
* For example:
.. code-block::
STUDY ASSAYS
Comment[SODAR Assay Link Results] Report File;Derived Data File
Comment[SODAR Assay Link MiscFiles] Protocol File;Antibody Panel
Comment[SODAR Assay Link Row] Raw Data File
- Used with measurement type / technology type
* N/A (is only used when the ``SODAR Assay Plugin`` comment is set)

Generic Raw Data Assay Plugin
-----------------------------

Expand All @@ -150,7 +197,7 @@ Generic Raw Data Assay Plugin
- Inline links
* *Raw data files* are linked to ``RawData``
- Used with measurement type / technology type
* N/A (can be used with the ``SODAR Assay Plugin`` comment override)
* N/A (is only used when the ``SODAR Assay Plugin`` comment is set)

Metabolite Profiling / Mass Spectrometry Plugin
-----------------------------------------------
Expand Down
5 changes: 5 additions & 0 deletions samplesheets/assayapps/generic/apps.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from django.apps import AppConfig


class GenericConfig(AppConfig):
name = 'samplesheets.assayapps.generic'
212 changes: 212 additions & 0 deletions samplesheets/assayapps/generic/plugins.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
"""Assay app plugin for samplesheets"""

import re
from django.conf import settings

from altamisa.constants import table_headers as th
from samplesheets.plugins import SampleSheetAssayPluginPoint
from samplesheets.rendering import SIMPLE_LINK_TEMPLATE
from samplesheets.utils import get_top_header
from samplesheets.views import MISC_FILES_COLL, RESULTS_COLL


# Local constants
APP_NAME = 'samplesheets.assayapps.generic'
RESULTS_COMMENT = 'SODAR Assay Link Results'
MISC_FILES_COMMENT = 'SODAR Assay Link MiscFiles'
DATA_COMMENT_PREFIX = 'SODAR Assay Row Path'
DATA_LINK_COMMENT = 'SODAR Assay Link Row'


class SampleSheetAssayPlugin(SampleSheetAssayPluginPoint):
"""Plugin for generic data linking in sample sheets"""

#: Name (used in code and as unique idenfitier)
name = 'samplesheets_assay_generic'

#: Title
title = 'Generic Assay Plugin'

#: App name for dynamic reference to app in e.g. caching
app_name = APP_NAME

#: Identifying assay fields (used to identify plugin by assay)
# NOTE: This assay plugin is accessed by the "SODAR Assay Plugin" override
assay_fields = []

#: Description string
description = 'Creates data links from comments in ISA investigation file'

#: Template for assay addition (Assay object as "assay" in context)
assay_template = None

#: Required permission for accessing the plugin
# TODO: TBD: Do we need this?
permission = None

#: Toggle displaying of row-based iRODS links in the assay table
display_row_links = True

@staticmethod
def _link_from_comment(cell, header, top_header, target_cols, url):
"""
Creates collection links for targeted columns.
:param cell: Dict (obtained by iterating over a row)
:param header: Column header
:param top_header: Column top header
:param target_cols: List of column names.
:param url: Base URL for link target.
"""
# Do nothing if not string or link
if not isinstance(cell['value'], str) or re.search(
'.+ <.*>', cell['value']
):
return True
# Special case for Material Names
if (
top_header['value']
in th.DATA_FILE_HEADERS + th.MATERIAL_NAME_HEADERS
) and (header['value'] == 'Name'):
cell['link'] = f"{url}/{cell['value']}"
return True
# Handle everything else
if header['value'].lower() in target_cols:
cell['value'] = SIMPLE_LINK_TEMPLATE.format(
label=cell['value'],
url=f"{url}/{cell['value']}",
)
return True

@classmethod
def _get_col_value(cls, target_col, row, table):
"""
Return value of last matched column.
:param target_col: Column name to look for
:param row: List of dicts (a row returned by SampleSheetTableBuilder)
:param table: Full table with headers (dict returned by
SampleSheetTableBuilder)
:return: String with cell value of last matched column
"""
# Returns last match of row
value = None
if target_col:
for i in range(len(row)):
header = table['field_header'][i]
if header['value'].lower() == target_col.lower():
value = row[i]['value']
return value

def get_row_path(self, row, table, assay, assay_path):
"""
Return iRODS path for an assay row in a sample sheet. If None,
display default path. Used if display_row_links = True.
:param row: List of dicts (a row returned by SampleSheetTableBuilder)
:param table: Full table with headers (dict returned by
SampleSheetTableBuilder)
:param assay: Assay object
:param assay_path: Root path for assay
:return: String with full iRODS path or None
"""
# Extract comments starting with DATA_COMMENT_PREFIX; sorted
data_columns = [
value
for name, value in sorted(assay.comments.items())
if name.startswith(DATA_COMMENT_PREFIX)
]

data_collections = []
for column_name in data_columns:
col_value = self._get_col_value(column_name, row, table)
if col_value:
data_collections.append(col_value)

# Build iRODS path from list and stop at first None value
if data_collections:
data_path = '/' + '/'.join(data_collections)
return assay_path + data_path
return None

def update_row(self, row, table, assay, index):
"""
Update render table row with e.g. links. Return the modified row.
:param row: Original row (list of dicts)
:param table: Full table (dict)
:param assay: Assay object
:param index: Row index (int)
:return: List of dicts
"""
if not settings.IRODS_WEBDAV_ENABLED or not assay:
return row
assay_path = self.get_assay_path(assay)
if not assay_path:
return row

base_url = settings.IRODS_WEBDAV_URL + assay_path
top_header = None
th_colspan = 0

results_cols = assay.comments.get(RESULTS_COMMENT)
if results_cols:
results_cols = results_cols.lower().split(';')
misc_cols = assay.comments.get(MISC_FILES_COMMENT)
if misc_cols:
misc_cols = misc_cols.lower().split(';')
data_cols = assay.comments.get(DATA_LINK_COMMENT)
if data_cols:
data_cols = data_cols.lower().split(';')
if table['irods_paths'][index]:
row_path = table['irods_paths'][index]['path']
else:
row_path = self.get_row_path(row, table, assay, assay_path)

for i in range(len(row)):
header = table['field_header'][i]
if not top_header or i >= th_colspan:
top_header = get_top_header(table, i)
th_colspan += top_header['colspan']

# TODO: Check if two comments reference the same column header?
# Create Results links
if results_cols:
if self._link_from_comment(
row[i],
header,
top_header,
results_cols,
f'{base_url}/{RESULTS_COLL}',
):
continue
# Create MiscFiles links
if misc_cols:
if self._link_from_comment(
row[i],
header,
top_header,
misc_cols,
f'{base_url}/{MISC_FILES_COLL}',
):
continue
# Create DataCollection links
if data_cols:
self._link_from_comment(
row[i],
header,
top_header,
data_cols,
f'{settings.IRODS_WEBDAV_URL}{row_path}',
)
return row

def update_cache(self, name=None, project=None, user=None):
"""
Update cached data for this app, limitable to item ID and/or project.
:param name: Item name to limit update to (string, optional)
:param project: Project object to limit update to (optional)
:param user: User object to denote user triggering the update (optional)
"""
self._update_cache_rows(APP_NAME, name, project, user)

0 comments on commit 8ebfd20

Please sign in to comment.