Patient Individualized Progression Slopes (PIPS)

Pipeline for Design of Patient-Specific Disease Progression Phenotypes

Project: Women-Led Hackathon
- Author(s): Hampton Leonard, Ruth Chia, Sara Bandres-Ciga, Monica Diez-Fairen, and Mary B. Makarious
- PI: --
- Date Written: 08.05.2019
- Date Last Updated: 10.05.2019
- Collaborators: Hirotaka Iwaki

Description + Objective

Motivation: Most models do not take in longitudinal data in a straightforward manner - many will require tweaking in the actual code. To the best of our knowledge there isn’t a model or code that will create a proxy/snapshot of patient-specific derived phenotype from longitudinal data that can be easily plugged in to different types of association analyses. The overall contribution of genetic factors to the severity and progression of diseases has not been well studied. This would be helpful to stratify etiological subtypes of disease and provide new insights for future clinical trials.

Overview: We will be taking in deeply phenotyped information with more than two longitudinal time points in conjunction with genotypic information. Each time point is a check up. First, using linear mixed modelling (LMM), we will be able to input multiple time points per patient, and be able to develop a single representation of the disease progression that is patient-specific. The pipeline will be generalizable: Will take in a dataset, preprocess it, scale it accordingly, determine the number of covariates, plot AUC, and output a desired outcome that is clinically relevant to your data. The intention is to apply this approach to a wide range of diseases, and eventually compare different modeling techniques to get the best outcome. This project aims to make it easier to find traits significantly associated with progression in a disease-specific manner while giving patients a personalized snapshot of how they will progress based on how they have been progressing thus far to use for downstream analyses such as GWAS and different ML projects.

Objectives:

To take in multiple different time points associated with a disease
Preprocess, clean, scale, and generate principle components for given covariates
Output a single representation associated with disease progression

Goals:

Pipeline to model progression phenotypes
Testing phenotypes
- To test the association between genetic variants and the phenotype-derived feature of disease on a genome-wide scale
Testing in different diseases to validate pipeline
Apply different models to compare performance of phenotype derivation as proxy for longitudinal clinical data

Roles + Daily Updates

Roles:

Lead: Hampton Leonard
Sys Admin(s): Mary M. and Monica D.
Writer(s): Ruth C., Sara BC, Mary M.

Google Slides Presentation: Google Slides Link

Proposed Workflow

Dataset Breakdown:

441 PPMI samples: clinical/phenotype + genotype data (to be used in Goal 1+2)
ABC Health data: clinical/phenotype + genotype (to be used in Goal 3+2)

Requirements

Environment Requirements

```R
sessionInfo()

R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] hydroGOF_0.3-10   zoo_1.8-5         rlist_0.4.6.1     knitr_1.21        lmerTest_3.1-0    data.table_1.12.0 RNOmni_0.7.1      lme4_1.1-21       Matrix_1.2-15    
[10] forcats_0.4.0     stringr_1.4.0     dplyr_0.8.0.1     purrr_0.3.0       readr_1.3.1       tidyr_0.8.2       tibble_2.0.1      ggplot2_3.1.0     tidyverse_1.2.1  

```

Data Input Requirements

Deep longitudinal phenotype data
- Give info for the accepted format/data structure (this includes PCs if available)
- Avoid missing data in the input file
  - Formatting should match the example below:
```
# Format
ID 	TIME_SERIES	PHENOTYPE_OF_INTEREST	COV1 COV2	COV3 etc..

# PPMI Example
ID	TSTART	UPDRS3	FEMALE  YEARSEDUC  AAO  DOPA  AGONIST 
100	61	12	0	16	63	0	0
```
Covariate file list
- Format required: one covariate name per line in a .txt file (this is case sensitive!)
  - Formatting should match the example below:
```
# Format
COV1
COV2
COV3
PC1
PC2
PC3

# PPMI Example
DOPA
AGONIST
FEMALE
```

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
Images		Images
Misc_Files		Misc_Files
.DS_Store		.DS_Store
LICENSE		LICENSE
PIP_v2.R		PIP_v2.R
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Patient Individualized Progression Slopes (PIPS)

Description + Objective

Roles + Daily Updates

Proposed Workflow

Requirements

Environment Requirements

Data Input Requirements

Output

Data Output

About

Releases

Packages

Contributors 3

Languages

License

NCBI-Hackathons/Pipeline-for-Design-of-Patient-Specific-Disease-Progression-Phenotypes

Folders and files

Latest commit

History

Repository files navigation

Patient Individualized Progression Slopes (PIPS)

Description + Objective

Roles + Daily Updates

Proposed Workflow

Requirements

Environment Requirements

Data Input Requirements

Output

Data Output

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages