Skip to content

Latest commit

 

History

History
414 lines (385 loc) · 14.7 KB

Algorithms.md

File metadata and controls

414 lines (385 loc) · 14.7 KB

Matching Algorithm

See the Swagger page for detailed usage of the Matching APIs. This document describes the different approaches to matching a Research Purpose and a Consent on a dataset. Both Purpose and Consent are structured objects which allows for computational matching at scale.

Version 4

View

This version of the algorithm uses a custom set of business rules to match a research purpose and consented dataset. In determining a positive match between research purpose and consented dataset, we make sure that the consented dataset matches ALL conditions specified in the research purpose. The primary difference between this version and the previous version is the update to Non-Profit Use (NPU) business logic.

If my Research Purpose has... What datasets should I see? Logical Rationale
Disease focused research (i.e. DS-X)
  • Any dataset tagged with GRU=true
  • Any dataset tagged with HMB=true
  • Any dataset tagged to this disease (DS-X) exactly or a parent disease of DS-X
  • Approve if the dataset's Primary DUO terms are DS- or a subclass
  • Deny if the dataset's Primary DUO terms are NOT the DS- or a subclass
Use of data is limited to health/medical/biomedical purposes, not including population origins or ancestry (i.e. HMB)
  • Any dataset tagged with GRU=true
  • Any dataset tagged with HMB=true
  • Approve if the dataset's Primary DUO terms are HMB, GRU
  • Deny if the dataset's Primary DUO terms are DS-, POA
Study population origins or ancestry (i.e. POA)
  • Any dataset tagged with GRU=true
  • Any dataset tagged with POA=true
  • Approve if the dataset's Primary DUO terms are GRU, POA
  • Deny if the dataset's Primary DUO terms are DS-, HMB
Methods development (i.e. MDS)
  • Any dataset tagged with GRU=true
  • Any dataset tagged with DS-X=true
  • Any dataset tagged with POA=true
  • Any dataset tagged with HMB=true
  • Approve if the dataset's Primary DUO terms are GRU, DS-, HMB, POA
Non-profit use (i.e. NPU)
  • Any dataset tagged with NPU=true OR NPU=false Datasets with NPU=false set effectively have no restriction on NPU usage.
  • Deny if the research purpose is NPU=false and the dataset's DUO term is NPU=true

Abstain from Decision

Due to the variety of sensitive research areas, ethical reasons, and areas where categorization is not possible, the DUOS system will not render a decision in the following cases.

  • Other
  • Geographical Restrictions (i.e. GS-)
  • Public Moratorium/Embargo (i.e. MOR)
  • Genetic Studies Only (i.e. GSO)
  • Publication Required (i.e. PUB)
  • Collabration Required (i.e. COL)
  • Ethics Approval Required (i.e. IRB)
  • Limitation to one gender
  • Restricted to a pediatric population (under the age of 18)
  • Illegal behaviors (violence, domestic abuse, prostitution, sexual victimization)
  • Alcohol or drug abuse, or abuse of other addictive products
  • Sexual preferences or sexually transmitted diseases
  • Any stigmatizing illnesses
  • Vulnerable populations as defined in 456 CFR (children, prisoners, pregnant women, mentally disabled persons, or ["SIGNIFICANTLY"] economically or educationally disadvantaged persons)
  • Population Origins/Migration patterns
  • Psychological traits, including intelligence, attention, emotion
  • Ethnicity, race, or gender with genotypic or other phenotypic variables, for purposes beyond biomedical or health-related research, or in ways that are not easily related to Health
  • Version 3

    View

    This version of the algorithm uses a custom set of business rules to match a research purpose and consented dataset. In determining a positive match between research purpose and consented dataset, we make sure that the consented dataset matches ALL conditions specified in the research purpose.

    If my Research Purpose has... What datasets should I see? Logical Rationale
    Disease focused research (i.e. DS-X)
    • Any dataset tagged with GRU=true
    • Any dataset tagged with HMB=true
    • Any dataset tagged to this disease (DS-X) exactly or a parent disease of DS-X
    • Approve if the dataset's Primary DUO terms are DS- or a subclass
    • Deny if the dataset's Primary DUO terms are NOT the DS- or a subclass
    Use of data is limited to health/medical/biomedical purposes, not including population origins or ancestry (i.e. HMB)
    • Any dataset tagged with GRU=true
    • Any dataset tagged with HMB=true
    • Approve if the dataset's Primary DUO terms are HMB, GRU
    • Deny if the dataset's Primary DUO terms are DS-, POA
    Study population origins or ancestry (i.e. POA)
    • Any dataset tagged with GRU=true
    • Any dataset tagged with POA=true
    • Approve if the dataset's Primary DUO terms are GRU, POA
    • Deny if the dataset's Primary DUO terms are DS-, HMB
    Methods development (i.e. MDS)
    • Any dataset tagged with GRU=true
    • Any dataset tagged with DS-X=true
    • Any dataset tagged with POA=true
    • Any dataset tagged with HMB=true
    • Approve if the dataset's Primary DUO terms are GRU, DS-, HMB, POA
    Commercial purpose/by a commercial entity
    • Any dataset where NPU and NCU are both false
    • Deny if the dataset's Primary DUO terms are Non-profit use(NPU), Non-commercial use (NCU)

    Abstain from Decision

    Due to the variety of sensitive research areas, ethical reasons, and areas where categorization is not possible, the DUOS system will not render a decision in the following cases.

  • Other
  • Geographical Restrictions (i.e. GS-)
  • Public Moratorium/Embargo (i.e. MOR)
  • Genetic Studies Only (i.e. GSO)
  • Publication Required (i.e. PUB)
  • Collabration Required (i.e. COL)
  • Ethics Approval Required (i.e. IRB)
  • Limitation to one gender
  • Restricted to a pediatric population (under the age of 18)
  • Illegal behaviors (violence, domestic abuse, prostitution, sexual victimization)
  • Alcohol or drug abuse, or abuse of other addictive products
  • Sexual preferences or sexually transmitted diseases
  • Any stigmatizing illnesses
  • Vulnerable populations as defined in 456 CFR (children, prisoners, pregnant women, mentally disabled persons, or ["SIGNIFICANTLY"] economically or educationally disadvantaged persons)
  • Population Origins/Migration patterns
  • Psychological traits, including intelligence, attention, emotion
  • Ethnicity, race, or gender with genotypic or other phenotypic variables, for purposes beyond biomedical or health-related research, or in ways that are not easily related to Health
  • Version 2

    View

    This version of the algorithm uses a custom set of business rules to match a research purpose and consented dataset. In determining a postive match between research purpose and consented dataset, we make sure that the consented dataset matches ALL conditions specified in the research purpose.

    This was originally developed for FireCloud and is the basis for the Data Catalog search ruleset. This version makes use of Consent Codes as developed for the GA4GH as well as Disease Codes (DS-X) from the Human Disease Ontology.

    If my Research Purpose has... What datasets should I see? Related DUL question
    Disease focused research (i.e. DS-X)
    • Any dataset with GRU=true
    • Any dataset with HMB=true
    • Any dataset tagged to this disease (DS-X) exactly or a parent disease of DS-X
    • Data is available for future general research use
    • Future use is limited for health/medical/biomedical research
    • Future use is limited to research involving the following disease area(s) DS-X
    Methods development/Validation study
    • Any dataset with GRU=true
    • Any dataset where NMDS is false
    • Any dataset where NMDS is true AND DS-X match
    • Future use for methods research (analytic/software/technology development) outside the bounds of the other specified restrictions is prohibited NMDS
    Control Set
    • Any dataset where NCTRL is false and is (GRU or HMB)
    • Any DS-X match, if user specified a disease in the research purpose
    • Future use as a control set for diseases other than those specified is prohibited NCTRL
    • Future use is limited to research involving the following disease area(s) DS-X
    Aggregate analysis to understand variation in the general population
    • Any dataset where NAGR is false and is (GRU or HMB)
    • Future use of aggregate-level data for general research purposes is prohibited NAGR
    Study population origins or ancestry
    • Any dataset tagged with GRU
    • Future use is limited to research involving a specific population POA
    Commercial purpose/by a commercial entity
    • Any dataset where NPU and NCU are both false
    • Future commercial use is prohibited NCU. Future use by for-profit entities is prohibited NPU
    Pediatric focused research
    • Any dataset tagged with RS-PD
    • Future use is limited to pediatric research RS-PD
    Gender focused research
    • Any dataset tagged with RS-G:F OR N/A when gender is F
    • Any dataset tagged with RS-G:M OR N/A when gender is M
    • Future use is limited to research involving a particular gender RS-G

    Version 1

    View

    Deprecated & Removed

    The original version of the algorithm uses an ontology tree to match a purpose and consent. First, we construct a composite ontology tree from:

    Next, we create ontology nodes for the consent, the purpose and add them to the composite tree. Using an OWL Reasoner, we determine if the purpose is a subclass of the consent, or not. A valid ontological subclass for a research purpose indicates a successful match between the purpose and the consent. See Use Restriction Grammar for how we create ontology nodes for a consent or research purpose.