Metagenomics and single-cell sequencing have enabled, for the first time, glimpses into the vast metabolic potential of Earth’s collective biological systems. Yet, for the most part we can’t accurately predict nor identify the products of most biosynthetic pathways. Most of what we know of microbial biochemistry is based on characterization of a few model microorganisms, and these findings have been extended through sequence correlations to the rest of sequence space. Unfortunately, these extrapolations have questionable validity for the vast majority of environmental microbes and therefore requires fundamentally different approaches for directly linking novel sequences to their biochemical functions.
Our vision is to systematically explore dark-biochemistry, using state-of-the-art workflows that integrate large-scale DNA synthesis with metabolomics, high-performance computing, and chemoinformatics. Bioinformatics mining of the over 500 billion unique genes catalogued by the DOE Joint Genome Institute can be used to prioritize high-novelty candidate biosynthesis clusters. Through synthetic biology approaches candidate clusters can be refactored and expressed in model organisms for characterization of the resulting biochemical activities and products with mass spectrometry. When integrated with novel chemoinformatic algorithms, this creates a closed-loop cycle of design, build, test, and learn for systematically mapping biochemical space.
For more documentation and a tutorial on how to analyze MAGI results, you can visit the MAGI website
We are currently developing MAGI 2.0, which can be found in workflow_2. Where MAGI 1.0 links genes to metabolites based on similarity between chemical structures, known reactions and similarity between genes, MAGI 2.0 has an additional feature. Based on reaction patterns, MAGI 2.0 calculates whether a metabolite can be used by an enzyme and by that, MAGI 2.0 adds more evidence to gene-metabolite links. Below, you will find instructions to install both MAGI 1.0 and MAGI 2.0.
- Integrates a file of metabolites with a file of gene sequences
- A database of all publicly available reactions
- Compounds in those reactions
- API for accessing this information programmaticallly
We have created a Docker image that should make running MAGI locally a breeze! Please follow the steps in the docs. This Docker is not suited for MAGI 2.0.
There are three steps you need to complete to install MAGI:
- Clone the repository and set up local environment and paths
- Install BLAST to the appropriate repository directory
- Test MAGI to make sure it works correctly
- Run MAGI locally
- (Optional) If interfacing with the magi website, you need to adjust some additional paths
These will take approximately 10 minutes to install.
If you don't have Anaconda already, install Anaconda or miniconda.
The following will:
- Set up your local settings files
- Adjust a couple paths in the .py files.
- Create a Conda environment for running MAGI
For MAGI 1.0:
$ git clone https://github.com/biorack/magi.git
$ cd magi
$ python setup.py
$ conda env create -f magi_env.yml
$ source activate magi
For MAGI 2.0
$ git clone https://github.com/biorack/magi.git
$ cd magi
$ python setup_magi2.py
$ conda env create -f magi_2_env.yml
$ source activate magi_2
The Windows installation for MAGI 1.0 is a little different. MAGI 2.0 has not been tested on Windows.
$ git clone https://github.com/biorack/magi.git
$ cd magi
$ python setup_windows.py
$ conda env create -f magi_env.yml
$ conda activate magi
$ cd tests/full_workflow_test/
Two NCBI BLAST binaries are required to run MAGI.
You may download the BLAST binaries appropriate for your machine
here,
and simply copy the blastp
and makeblastdb
binaries into workflow/blastbin
.
For Windows, copy blastp.exe, makeblastdb.exe and nghttp2dll (if it is in the .tar.gz folder on NCBI that you downloaded).
To confirm everything was set up correctly, run the following test. You will see some warnings; this is normal. The test should take a few minutes. This test only works for MAGI 1.0. The test for MAGI 2.0 is still under development, but you can probably use MAGI 2.0 without running a test.
$ cd tests/full_workflow_test/
$ ./run_full_workflow_test.sh
If you use Git Bash, you can follow the Linux & MacOS instructions. However, you may need to load conda in Git Bash by typing
./c/ProgramData/Anaconda3/etc/profile.d/conda.sh
conda activate base
conda activate magi
If you use the Anaconda Prompt:
$ python time python ../../workflow/magi_workflow.py --fasta ./s_coelicolor_genes_fasta_smallset.faa --compounds ./s_coelicolor_pactolus_data_smallset.csv --output ./test_output_files --cpu_count 4 --mute
The easiest way to run MAGI locally is to copy the script run_magi.sh (or run_magi2.sh for MAGI 2.0) to a directory and to add your path to the MAGI directory, the path to your fasta and compounds file. Run the script from the command line. Note that some parts of this workflow are still under construction. For further details, you could use the --help function from the command line or read the README in the Workflow folder.
If you are interfacing with the magi_web repository, you need to manually change a few things in magi_job/
; otherwise, ignore this section.
- change local settings import path in magi_job/utils.py
- set absolute path to workflow/magi_workflow.py in job_data() in magi_job/utils.py
- Python 2.7
- pandas
- numpy
- rdkit
- molVS
- networkx
- pytables
- requests (only if you are using scripts in
magi_job/
)
- Python 3.6 or higher
- pandas
- numpy
- rdkit
- molVS
- pytables (or tables)
- requests (only if you are using scripts in
magi_job/
)
After a succesful setup, local_settings/
should contain (at least) 3 files:
local_settings.py
user_settings.py
__init__.py
local_settings.py
should just have one line in it describing the name of the user_settings.py file:
SETTINGS_FILE = 'user_settings'
user_settings.py
should have the following paths and variables defined:
repo_location = '' # path to repo location
blastbin = '' # path to BLAST binary
refseq_path = '' # path to reaction reference sequence library
refseq_db = '' # path to BLAST database for reference sequence library
mrs_reaction_path = '' # path to metabolite-reaction-refseq database
compounds_df = '' # path to compounds database
mst_path = '' # path to chemical similarity network graph
chemnet_pickle = '' # path to chemical similarity network descriptions
# The next 2 lines are only required if you are interfacing with magi_web
magiwebsuperuser = '' # admin username for magi_web
magiwebsuperuserpass = '' # admin password for magi_web
magiweburl = '' # URL to magi web (e.g. https://magi.nersc.gov)
When switching between machines or databases, you may have multiple user_settings.py
files that can be named whatever you want as long as the variable in local_settings.py
is defined correctly
After a succesful setup, local_settings/
should contain (at least) 3 files:
local_settings_magi2.py
magi_2_user_settings.py
__init__.py
local_settings_magi2.py
should just have one line in it describing the name of the magi_2_user_settings.py file:
SETTINGS_FILE = 'magi_2_user_settings'
magi_2_user_settings.py
should have the following paths and variables defined:
repo_location = "" # Location where MAGI is stored locally
blastbin = "" # Location where NCBI BLAST tools are stored. Note that this is the same for MAGI1.0 and MAGI2.0
magi_database = "" # Location where MAGI database is stored
refseq_path = "" # Database with UniProt reference sequences of proteins that have a Rhea reation
refseq_db = "" # Database with UniProt reference sequences of proteins that have a Rhea reation
When switching between machines or databases, you may have multiple user_settings.py
files that can be named whatever you want as long as the variable in local_settings.py
is defined correctly
The development of MAGI was made possible by: