diff --git a/README.md b/README.md index 5a89e02..699b9cc 100644 --- a/README.md +++ b/README.md @@ -21,6 +21,10 @@ __MIOM__ (Mixed Integer Optimization for Metabolism) is a python library for cre MIOM offers a high-level API that leverages the power of modern Mixed Integer Optimization (MIO) solvers to easily define steady-state metabolic optimization problems, from simple Flux Balance Analysis (FBA) simulations, to more complex problems, such as sparse FBA or context-specific reconstruction algorithms, and solve them the __required level of optimality__. It supports many free and commercial solvers thanks to the integration with the [PICOS](https://picos-api.gitlab.io/picos/) and the [Python-MIP](https://www.python-mip.com/). It is also compatible and complementary to [cobrapy](https://opencobra.github.io/cobrapy/). +Here is a simple example of the differences of implementing from scratch a simple Flux Balance Analysis (FBA), which is typically implemented as a Linear Program (LP) problem, and the Sparse FBA problem which is an Integer Programmming (IP) problem that solves the same FBA problem but minimizing the number of active reactions: + +> NOTE: This library is functional but still in a very early stage. API is still not stable and might be changed in future versions. + ```python import miom @@ -49,8 +53,6 @@ V, X = (model print("Number of active reactions:", sum(abs(V) > 1e-8)) ``` -> NOTE: This library is functional but still in a very early stage. API is still not stable and might be changed in future versions. - ## Installation ### Minimal install @@ -74,100 +76,168 @@ CPLEX is also supported, but requires a valid license. To install MIOM with CPLE ## Quick start -The first step is to import a GEM (a genome-scale model) using the `miom.mio.load_gem` method. The method accepts a local file or an URL. SBML files and Matlab files are supported through `cobra` and `scipy` packages. Please make sure that these packages are correctly installed before trying to import models in these formats (see [Full install](#full-install)). +### Importing a Genome-Scale Metabolic Network (GEMs) -Once these packages are available, you can import a model using: +First step is to import a Genome-Scale Metabolic Network. The method `load_gem()` can be used to import a network from a local file or a URL. This method requires the [cobrapy](https://cobrapy.readthedocs.io/en/latest/) and [scipy](https://www.scipy.org/) packages to import networks in the SBML, YAML, JSON, and Matlab formats (see [Full install](#full-install)). Here is an example of importing the [BiGG Recon3D](http://bigg.ucsd.edu/models/Recon3D) network: ```python import miom -network = miom.mio.load_gem("https://github.com/SysBioChalmers/Human-GEM/raw/main/model/Human-GEM.mat") -print("Number of reactions in the network", network.num_reactions) +network = miom.load_gem('https://github.com/SBRG/bigg_models_data/raw/master/models/Recon3D.mat') +print("Number of reactions in the network:", network.num_reactions) ``` -MIOM includes its own lightweight and portable format (.miom) for loading and storing metabolic networks, which does not require any additional dependency. This models uses `numpy` compressed structured arrays to store the basic information of the metabolic networks required for performing simulations. This format is even smaller and more portable than the Matlab files. A repository of converted models is available at https://github.com/pablormier/miom-gems. - -To make things even easier, the method `load_gem` can import any model from this repository using the relative path, for example: +By default, MIOM uses its own format (.miom) which is a lightweight, compressed and portable format based on numpy structured arrays to store only the essential information required to perform common tasks. To improve reproducibility of experiments, a repository of already converted models is available at [pablormier/miom-gems](https://github.com/pablormier/miom-gems/). Any model from this repository can be imported using shorthand links in the following forma: `@relative/path/to/model`. For example, in order to import the [Human-GEM v1.9.0](https://github.com/pablormier/miom-gems/tree/main/gems/SysBioChalmers/Human-Gem/v1.9.0), you only need to run: ```python -human1 = miom.mio.load_gem("@SysBioChalmers/Human-GEM.miom") -recon3d = miom.mio.load_gem("@BiGG/Recon3D.miom") +network = miom.load_gem('@SysBioChalmers/Human-Gem/v1.9.0') ``` -Here is an example of how to load a metabolic network and maximize the flux through a target reaction using FBA, and then how to modify the original problem to implement the sparse FBA problem adding only a few lines to the original problem: +This is a very convenient way of importing models and sharing **reproducible experiments**, making sure that other users are going to test the same models. -```python -import miom +### Managing Metabolic Networks -network = miom.mio.load_gem("@mus_musculus_iMM1865.miom") -target_rxn = "BIOMASS_reaction" +MIOM is not intended to provide functions for creating and managing metabolic networks, since there are already other libraries for this purpose (e.g. [cobrapy](https://opencobra.github.io/cobrapy/)). If you rely on `cobrapy` for model manipulation, you can still use MIOM for the optimization after you prepare your metabolic network, as MIOM is nicely integrated with COBRA: -# Create the optimization problem with miom and solve -model = (miom - .load(network) +```python +from cobra.test import create_test_model +network = create_test_model("textbook") +medium = network.medium +medium["EX_o2_e"] = 0.0 +network.medium = medium + +# Use MIOM for optimization +flux = (miom + .load(miom.mio.cobra_to_miom(network)) .steady_state() - .set_rxn_objective(target_rxn) - .solve(verbosity=1)) + .set_rxn_objective('Biomass_Ecoli_core') + .solve() + .get_fluxes('Biomass_Ecoli_core')) +``` -print("Optimal flux:", model.get_fluxes(target_rxn), "mmol/(h·gDW)") +Metabolic Networks in MIOM are represented by a `miom.mio.MiomNetwork` class. This class encodes the network into three matrices: `R` (reactions), `S` (stoichiometry), and `M` (metabolites). The `R` matrix is a numpy structured array that contains the list of the reactions defined in the network, including the reaction ID, the reaction name, the reaction bounds, the subsystem and the Gene-Protein-Rules: -# Show reactions with non-zero flux -V, X = model.get_values() -print("Number of reactions active reactions:", sum(abs(V) > 1e-8)) +```python +>>> miom.load_gem('@BiGG/Recon3D.miom').R[:5] + +array([('10FTHF5GLUtl', '5-Glutamyl-10Fthf Transport, Lysosomal', 0., 1000., 'Transport, lysosomal', ''), + ('10FTHF5GLUtm', '5-Glutamyl-10Fthf Transport, Mitochondrial', 0., 1000., 'Transport, mitochondrial', ''), + ('10FTHF6GLUtl', '6-Glutamyl-10Fthf Transport, Lysosomal', 0., 1000., 'Transport, lysosomal', ''), + ('10FTHF6GLUtm', '6-Glutamyl-10Fthf Transport, Mitochondrial', 0., 1000., 'Transport, mitochondrial', ''), + ('10FTHF7GLUtl', '7-Glutamyl-10Fthf Transport, Lysosomal', 0., 1000., 'Transport, lysosomal', '')], + dtype=[('id', 'O'), ('name', 'O'), ('lb', ' Note: To better understand the meaning of each step, please read the documentation of the [BaseModel class](https://metexplore.github.io/miom/references/miom/#miom.miom.BaseModel), and the complete example in [examples/sparse_fba.py](https://metexplore.github.io/miom/examples/sparse_fba). - +Numpy structured arrays make it easy to modify properties of the network. For example, in order to change the lower bound of all reactions in the network at the same time, you only need to do: ```python -V, X = (model - # Set the MIP Gap tolerance to 5%, using the default solver - # (COIN-OR CBC included with the Python-MIP lib) - .setup(opt_tol=0.05, verbosity=1) - # Set the fluxes to the optimal value found - .set_fluxes_for('BIOMASS_reaction') - # Transform to a subset selection problem - # adding a negative weight to all reactions - # (to remove them from the final solution) - .subset_selection(-1) - # Solve the MIO problem - .solve() - # Get continuos vars (fluxes) and binary vars - .get_values()) +network.R['lb'] = -1000 +``` -# Show reactions with non-zero flux -print("Number of active reactions:", sum(abs(V) > 1e-8)) +Structured arrays also support indexing: + +```python +import numpy as np +# List all reactions with a negative lower bound +network.R[network.R['lb'] < 0] +# Get the indexes of the reactions with a negative lower bound +idxs = np.flatnonzero(network.R['lb'] < 0) ``` +### Constraint-based optimization problems + +MIOM can be seen as a high-level API for creating constraint-based optimization problems by applying successive composable operations that add new constraints to the problem. The basic operation is the `steady_state()` method, which adds the system of equations `S * V = 0` to the optimization model, where `S` is the stoichiometric matrix and `V` is the vector of fluxes: + +```python +model = (miom + .load('@BiGG/Recon3D.miom') + .steady_state()) ``` -Number of active reactions: 404 + +After the equations + +```python +flux = (model + .set_rxn_objective('biomass_reaction') + .set_flux_bounds('biomass_reaction', max_flux=1.0) + .solve() + .get_fluxes('biomass_reaction')) ``` -Solving this problem with default COIN-OR CBC solver returns a solution with 404 active reactions (much less than the 2549 reactions obtained with FBA, and less than the 433 reactions returned by the CappedL1 approximation in the [sparseFBA](https://opencobra.github.io/cobratoolbox/stable/modules/analysis/sparseFBA/index.html) implementation in Matlab), with a relative gap between the lower and upper objective bound below 5% (as indicated in the setup method): +More complex constraints can be added by using the `add_constraint()` method. This method accepts an affine expression involving the variables in the model. Here is an example of how to perform FBA again but with a constraint that ensures that the sum through all non-reversible reactions is less than or equal to 10: +```python +import numpy as np + +# Load the Recon3D model from the MIOM repository +network = miom.load_gem('@BiGG/Recon3D.miom') +# The objective is to maximize the flux through the biomass reaction +model = miom.load(network).steady_state().set_rxn_objective('biomass_reaction') +# Find reactions that are not reversible (i.e. can have only positive fluxes) +non_reversible = np.flatnonzero(network.R['lb'] >= 0) +# Add the constraint that the sum through all non-reversible reactions is less than or equal to 10 +# (note that both PICOS and Python-MIP are symbolic, and expressions involving the variables are allowed) +constraint = sum(model.variables.fluxvars[i] for i in non_reversible) <= 10 +# Add to the model and solve +model.add_constraint(constraint).solve(verbosity=1) +``` ``` -Cbc0011I Exiting as integer gap of 122.04538 less than 1e-10 or 5% -Cbc0001I Search completed - best objective -10208, took 0 iterations and 0 nodes (28.34 seconds) -Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost -Total time (CPU seconds): 60.79 (Wallclock seconds): 60.79 +Starting solution of the Linear programming problem using Dual Simplex + +Coin0506I Presolve 2242 (-3594) rows, 6144 (-4456) columns and 33659 (-12128) elements +Clp0014I Perturbing problem by 0.001% of 1.9797539 - largest nonzero change 1.4130091e-06 ( 7.1372964e-05%) - largest zero change 1.412981e-06 +Clp0006I 0 Obj 0.0015868455 Primal inf 3303572.7 (1175) Dual inf 1.9797525 (1) +Clp0000I Optimal - objective value 9.062223 +Coin0511I After Postsolve, objective 9.062223, infeasibilities - dual 0 (0), primal 0 (0) +Clp0032I Optimal objective 9.062223036 - 1675 iterations time 0.142, Presolve 0.07 ``` -The concise API provided by MIOM makes everything explicit: the sparse FBA problem can be implemented as a best subset selection problem of reactions (minimize the number of reactions with non-zero flux) subject to the steady-state constraint and the optimality constraint of the flux for the target reaction (in this case the `BIOMASS_reaction`). Using this formulation, you can take advantage of modern solvers like CPLEX, GUROBI, MOSEK, COIN-OR CBC (among others) to obtain an optimal or an approximate solution, controlled by the `opt_tol` parameter. +### Mixed Integer Optimization + +Many constraint-based metabolic problems consist of selecting a subset of reactions from a generic metabolic network, subject to some biological-based constraints. For example, Sparse FBA consists of selecting the minimum number of active reactions that lead to a desired optimal flux. Context-specific network reconstruction methods such as iMAT, Fastcore, mCADRE, MBA, etc., have also the same purpose: select a subset of reactions from a network based on some experimental data and some objective function. All these problems are instances of a more general problem, known as **best subset selection problem**, which can be modelled using *Mixed Integer Optimization (MIO)*. However, instances of this problem are usually hard to solve, and some implementations like Fastcore or MBA use different heuristics or relaxations to find sub-optimal solutions in short time. + +Unfortunately, there are a few problems with these type of approximations. First, we don't know how good is the solution obtained, and in practice many studies that use those methods assume that the solution they obtained is the best one. Second, even though most of these techniques solve the same type of problem, they look very different or hard to understand for practitioners. Third, many of these implementations do not exploit the potential of the modern MIO solvers which are nowadays able to solve very large problems in a short time, and to adjust the desired level of optimality, which gives the user an idea of how good is the solution obtained. + +MIOM incorporates specific methods to easily model and solve such problems. Just by calling the method `subset_selection()`, which takes as input a weight for all reactions or a list of weights for each reaction, it transforms the current problem into a *best subset selection* problem, in which the objective function is the sum of the weights of the reactions in the solution (where positive weighted reactions contribute to the objective function if they are selected, and negative weighted reactions contribute to the objective function if they are not selected). -To use other solvers, you only need to provide the specific solver to the `miom` method, for example: +This simple method makes it possible to model a wide variety of complex constraint-based optimization problems. See for example how easy it is to implement the exact version of the weighted Fastcore with MIOM: ```python -model = (miom - .load('@BiGG/Recon3D.miom', solver=miom.Solvers.GLPK) +import miom +import numpy as np + +# Use the flux-consistent subnetwork (fcm) of the Human1 GEM model +# NOTE: Fastcore requires that reactions in the core are flux consistent, +# otherwise the problem would be infeasible. +m = miom.load_gem('@homo_sapiens_human1_fcm.miom') +# Select reactions from the cholesterol metabolism as the core reactions to keep +core_rxn = m.find_reactions_from_pathway("Cholesterol metabolism") +# Assign a negative weight for reactions not in the core +weights = -1 * np.ones(m.num_reactions) +weights[core_rxn == 1] = 1 + +fmc = (miom + .load(m) + .setup(opt_tol=0.05, verbosity=1) .steady_state() - .set_rxn_objective('biomass_reaction') - .solve(verbosity=1)) + .subset_selection(weights) + .keep(core_rxn == 1) + .solve() + .select_subnetwork( + mode=miom.ExtractionMode.ABSOLUTE_FLUX_VALUE, + comparator=miom.Comparator.GREATER_OR_EQUAL, + value=1e-8 + ) + .network +) +print(fmc.num_reactions) ``` ## Advantages diff --git a/docs/index.md b/docs/index.md index c38cd19..065962c 100644 --- a/docs/index.md +++ b/docs/index.md @@ -6,40 +6,47 @@ # Introduction -__MIOM__ (Mixed Integer Optimization for Metabolism) is a python library for creating and solving complex optimization problems using genome-scale metabolic networks, in just a few lines. - -MIOM offers a high-level API that leverages the power of modern Mixed Integer Optimization (MIO) solvers to easily define steady-state metabolic optimization problems, from simple Flux Balance Analysis (FBA) simulations, to more complex problems, such as [sparse FBA](https://metexplore.github.io/miom/examples/sparse_fba/) or context-specific reconstruction algorithms (see for example [iMAT](https://metexplore.github.io/miom/examples/imat/) or [Fastcore](https://metexplore.github.io/miom/examples/fastcore/)), and solve them the __required level of optimality__. It supports many free and commercial solvers thanks to the integration with the [PICOS](https://picos-api.gitlab.io/picos/) and the [Python-MIP](https://www.python-mip.com/). It is also compatible and complementary to [cobrapy](https://opencobra.github.io/cobrapy/). - - !!! warning This library is functional but still in a very early stage. API is still not stable and might be changed in future versions. -```python -import miom - -# Download the Recon3D metabolic network and find the maximum flux value -# through the biomass_reaction -model = (miom - .load('@BiGG/Recon3D.miom') - .steady_state() - .set_rxn_objective('biomass_reaction') - .solve(verbosity=1)) - -print("Optimal flux:", model.get_fluxes('biomass_reaction')) -print("Number of active reactions:", sum(abs(model.get_fluxes()) > 1e-8)) +__MIOM__ (Mixed Integer Optimization for Metabolism) is a python library for creating and solving complex optimization problems using genome-scale metabolic networks, in just a few lines. -# Transform the previous FBA optimization problem into a Sparse FBA problem (MIP). -# Solving this problem returns the minimum number of active reactions preserving -# the optimal flux through the biomass_reaction -V, X = (model - .setup(opt_tol=0.05) - .set_fluxes_for('biomass_reaction') - .subset_selection(-1) - .solve(verbosity=1) - .get_values()) +MIOM offers a high-level API that leverages the power of modern Mixed Integer Optimization (MIO) solvers to easily define steady-state metabolic optimization problems, from simple Flux Balance Analysis (FBA) simulations, to more complex problems, such as sparse FBA or context-specific reconstruction algorithms, and solve them the __required level of optimality__. It supports many free and commercial solvers thanks to the integration with the [PICOS](https://picos-api.gitlab.io/picos/) and the [Python-MIP](https://www.python-mip.com/). It is also compatible and complementary to [cobrapy](https://opencobra.github.io/cobrapy/). + +Here is a simple example of the differences of implementing from scratch a simple Flux Balance Analysis (FBA), which is typically implemented as a Linear Program (LP) problem, and the Sparse FBA problem which is an Integer Programmming (IP) problem that solves the same FBA problem but minimizing the number of active reactions: + +=== "FBA (LP)" + + ``` python + import miom + model = (miom + .load('@BiGG/Recon3D.miom') + .steady_state() + .set_rxn_objective('biomass_reaction') + .solve(verbosity=1)) + print("Optimal flux:", model.get_fluxes('biomass_reaction')) + print("Number of active reactions:", sum(abs(model.get_fluxes()) > 1e-8)) + ``` + +=== "Sparse FBA (MIP)" + + ``` python + import miom + model = (miom + .load('@BiGG/Recon3D.miom') + .setup(opt_tol=0.05, verbosity=1) + .steady_state() + .set_rxn_objective('biomass_reaction') + .solve() + .set_fluxes_for('biomass_reaction') + .subset_selection(-1) + .solve()) + print("Optimal flux:", model.get_fluxes('biomass_reaction')) + print("Number of active reactions:", sum(abs(model.get_fluxes()) > 1e-8)) + ``` -print("Number of active reactions:", sum(abs(V) > 1e-8)) -``` +!!! note + To better understand the meaning of each step, please read the documentation of the [BaseModel class](https://metexplore.github.io/miom/references/miom/#miom.miom.BaseModel), and the complete example in [examples/sparse_fba.py](https://metexplore.github.io/miom/examples/sparse_fba). ## Installation @@ -65,103 +72,182 @@ CPLEX is also supported, but requires a valid license. To install MIOM with CPLE ## Quick start -The first step is to import a GEM (a genome-scale model) using the `miom.mio.load_gem` method. The method accepts a local file or an URL. SBML files and Matlab files are supported through `cobra` and `scipy` packages. Please make sure that these packages are correctly installed before trying to import models in these formats (see [Full install](#full-install)). +### Importing a Genome-Scale Metabolic Network (GEMs) -Once these packages are available, you can import a model using: +First step is to import a Genome-Scale Metabolic Network. The method `load_gem()` can be used to import a network from a local file or a URL. This method requires the [cobrapy](https://cobrapy.readthedocs.io/en/latest/) and [scipy](https://www.scipy.org/) packages to import networks in the SBML, YAML, JSON, and Matlab formats (see [Full install](#full-install)). Here is an example of importing the [BiGG Recon3D](http://bigg.ucsd.edu/models/Recon3D) network: ```python import miom -network = miom.mio.load_gem("https://github.com/SysBioChalmers/Human-GEM/raw/main/model/Human-GEM.mat") -print("Number of reactions in the network", network.num_reactions) +network = miom.load_gem('https://github.com/SBRG/bigg_models_data/raw/master/models/Recon3D.mat') +print("Number of reactions in the network:", network.num_reactions) ``` -MIOM includes its own lightweight and portable format (.miom) for loading and storing metabolic networks, which does not require any additional dependency. This models uses `numpy` compressed structured arrays to store the basic information of the metabolic networks required for performing simulations. This format is even smaller and more portable than the Matlab files. A repository of converted models is available at https://github.com/pablormier/miom-gems. - -To make things even easier, the method `load_gem` can import any model from this repository using the relative path, for example: +By default, MIOM uses its own format (.miom) which is a lightweight, compressed and portable format based on numpy structured arrays to store only the essential information required to perform common tasks. To improve reproducibility of experiments, a repository of already converted models is available at [pablormier/miom-gems](https://github.com/pablormier/miom-gems/). Any model from this repository can be imported using shorthand links in the following forma: `@relative/path/to/model`. For example, in order to import the [Human-GEM v1.9.0](https://github.com/pablormier/miom-gems/tree/main/gems/SysBioChalmers/Human-Gem/v1.9.0), you only need to run: ```python -human1 = miom.mio.load_gem("@SysBioChalmers/Human-GEM/v1.9.0") -recon3d = miom.mio.load_gem("@BiGG/Recon3D.miom") +network = miom.load_gem('@SysBioChalmers/Human-Gem/v1.9.0') ``` -Here is an example of how to load a metabolic network and maximize the flux through a target reaction using FBA, and then how to modify the original problem to implement the sparse FBA problem adding only a few lines to the original problem: +This is a very convenient way of importing models and sharing **reproducible experiments**, making sure that other users are going to test the same models. -```python -import miom +### Managing Metabolic Networks -network = miom.mio.load_gem("@mus_musculus_iMM1865.miom") -target_rxn = "BIOMASS_reaction" +MIOM is not intended to provide functions for creating and managing metabolic networks, since there are already other libraries for this purpose (e.g. [cobrapy](https://opencobra.github.io/cobrapy/)). If you rely on `cobrapy` for model manipulation, you can still use MIOM for the optimization after you prepare your metabolic network, as MIOM is nicely integrated with COBRA: -# Create the optimization problem with miom and solve -model = (miom - .load(network) +```python +from cobra.test import create_test_model +network = create_test_model("textbook") +medium = network.medium +medium["EX_o2_e"] = 0.0 +network.medium = medium + +# Use MIOM for optimization +flux = (miom + .load(miom.mio.cobra_to_miom(network)) .steady_state() - .set_rxn_objective(target_rxn) - .solve(verbosity=1)) + .set_rxn_objective('Biomass_Ecoli_core') + .solve() + .get_fluxes('Biomass_Ecoli_core')) +``` -print("Optimal flux:", model.get_fluxes(target_rxn), "mmol/(h·gDW)") +Metabolic Networks in MIOM are represented by a `miom.mio.MiomNetwork` class. This class encodes the network into three matrices: `R` (reactions), `S` (stoichiometry), and `M` (metabolites). The `R` matrix is a numpy structured array that contains the list of the reactions defined in the network, including the reaction ID, the reaction name, the reaction bounds, the subsystem and the Gene-Protein-Rules: -# Show reactions with non-zero flux -V, X = model.get_values() -print("Number of reactions active reactions:", sum(abs(V) > 1e-8)) +```python +>>> miom.load_gem('@BiGG/Recon3D.miom').R[:5] + +array([('10FTHF5GLUtl', '5-Glutamyl-10Fthf Transport, Lysosomal', 0., 1000., 'Transport, lysosomal', ''), + ('10FTHF5GLUtm', '5-Glutamyl-10Fthf Transport, Mitochondrial', 0., 1000., 'Transport, mitochondrial', ''), + ('10FTHF6GLUtl', '6-Glutamyl-10Fthf Transport, Lysosomal', 0., 1000., 'Transport, lysosomal', ''), + ('10FTHF6GLUtm', '6-Glutamyl-10Fthf Transport, Mitochondrial', 0., 1000., 'Transport, mitochondrial', ''), + ('10FTHF7GLUtl', '7-Glutamyl-10Fthf Transport, Lysosomal', 0., 1000., 'Transport, lysosomal', '')], + dtype=[('id', 'O'), ('name', 'O'), ('lb', ' 1e-8)) +```python +import numpy as np +# List all reactions with a negative lower bound +network.R[network.R['lb'] < 0] +# Get the indexes of the reactions with a negative lower bound +idxs = np.flatnonzero(network.R['lb'] < 0) ``` +### Constraint-based optimization problems + +MIOM can be seen as a high-level API for creating constraint-based optimization problems by applying successive composable operations that add new constraints to the problem. The basic operation is the `steady_state()` method, which adds the system of equations `S * V = 0` to the optimization model, where `S` is the stoichiometric matrix and `V` is the vector of fluxes: + +```python +model = (miom + .load('@BiGG/Recon3D.miom') + .steady_state()) ``` -Number of active reactions: 404 + +Now the optimization model contains the system of equations for the steady state condition of the fluxes. Now, it can be extended by adding new constraints or changing properties of the model. For example, in order to find the maximum flux of the `biomass_reaction` (limiting the max production to 10) can be done as follows: + +```python +flux = (model + .set_rxn_objective('biomass_reaction') + .set_flux_bounds('biomass_reaction', max_flux=10.0) + .solve() + .get_fluxes('biomass_reaction')) ``` -Solving this problem with default COIN-OR CBC solver returns a solution with 404 active reactions (much less than the 2549 reactions obtained with FBA, and less than the 433 reactions returned by the CappedL1 approximation in the [sparseFBA](https://opencobra.github.io/cobratoolbox/stable/modules/analysis/sparseFBA/index.html) implementation in Matlab), with a relative gap between the lower and upper objective bound below 5% (as indicated in the setup method): +More complex constraints can be added by using the `add_constraint()` method. This method accepts an affine expression involving the variables in the model. Here is an example of how to perform FBA again but with a constraint that ensures that the sum through all non-reversible reactions is less than or equal to 10: + +```python +import numpy as np + +# Load the Recon3D model from the MIOM repository +network = miom.load_gem('@BiGG/Recon3D.miom') + +# The objective is to maximize the flux through the biomass reaction +model = miom.load(network).steady_state().set_rxn_objective('biomass_reaction') + +# Find reactions that are not reversible (i.e. can have only positive fluxes) +non_reversible = np.flatnonzero(network.R['lb'] >= 0) +# Add the constraint that the sum through all non-reversible reactions is less than or equal to 10 +# (note that both PICOS and Python-MIP are symbolic, and expressions involving the variables are allowed) +constraint = sum(model.variables.fluxvars[i] for i in non_reversible) <= 10 + +# Add to the model and solve +model.add_constraint(constraint).solve(verbosity=1) +``` ``` -Cbc0011I Exiting as integer gap of 122.04538 less than 1e-10 or 5% -Cbc0001I Search completed - best objective -10208, took 0 iterations and 0 nodes (28.34 seconds) -Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost -Total time (CPU seconds): 60.79 (Wallclock seconds): 60.79 +Starting solution of the Linear programming problem using Dual Simplex + +Coin0506I Presolve 2242 (-3594) rows, 6144 (-4456) columns and 33659 (-12128) elements +Clp0014I Perturbing problem by 0.001% of 1.9797539 - largest nonzero change 1.4130091e-06 ( 7.1372964e-05%) - largest zero change 1.412981e-06 +Clp0006I 0 Obj 0.0015868455 Primal inf 3303572.7 (1175) Dual inf 1.9797525 (1) +Clp0000I Optimal - objective value 9.062223 +Coin0511I After Postsolve, objective 9.062223, infeasibilities - dual 0 (0), primal 0 (0) +Clp0032I Optimal objective 9.062223036 - 1675 iterations time 0.142, Presolve 0.07 ``` -The concise API provided by MIOM makes everything explicit: the sparse FBA problem can be implemented as a best subset selection problem of reactions (minimize the number of reactions with non-zero flux) subject to the steady-state constraint and the optimality constraint of the flux for the target reaction (in this case the `BIOMASS_reaction`). Using this formulation, you can take advantage of modern solvers like CPLEX, GUROBI, MOSEK, COIN-OR CBC (among others) to obtain an optimal or an approximate solution, controlled by the `opt_tol` parameter. +By default, an optimization model is mutable. This means that every time a method that modifies the model is invoked, the original model is modified. For example, the `set_flux_bounds()` method modifies the bounds of the reactions in the model. If you want to create a copy of the model, you can use the `copy()` method to create a full copy of the model. + + +### Mixed Integer Optimization -To use other solvers, you only need to provide the specific solver to the `miom` method, for example: +Many constraint-based metabolic problems consist of selecting a subset of reactions from a generic metabolic network, subject to some biological-based constraints. For example, Sparse FBA consists of selecting the minimum number of active reactions that lead to a desired optimal flux. Context-specific network reconstruction methods such as iMAT, Fastcore, mCADRE, MBA, etc., have also the same purpose: select a subset of reactions from a network based on some experimental data and some objective function. All these problems are instances of a more general problem, known as **best subset selection problem**, which can be modelled using *Mixed Integer Optimization (MIO)*. However, instances of this problem are usually hard to solve, and some implementations like Fastcore or MBA use different heuristics or relaxations to find sub-optimal solutions in short time. + +Unfortunately, there are a few problems with these type of approximations. First, we don't know how good is the solution obtained, and in practice many studies that use those methods assume that the solution they obtained is the best one. Second, even though most of these techniques solve the same type of problem, they look very different or hard to understand for practitioners. Third, many of these implementations do not exploit the potential of the modern MIO solvers which are nowadays able to solve very large problems in a short time, and to adjust the desired level of optimality, which gives the user an idea of how good is the solution obtained. + +MIOM incorporates specific methods to easily model and solve such problems. Just by calling the method `subset_selection()`, which takes as input a weight for all reactions or a list of weights for each reaction, it transforms the current problem into a *best subset selection* problem, in which the objective function is the sum of the weights of the reactions in the solution (where positive weighted reactions contribute to the objective function if they are selected, and negative weighted reactions contribute to the objective function if they are not selected). + +This simple method makes it possible to model a wide variety of complex constraint-based optimization problems. See for example how easy it is to implement the exact version of the weighted Fastcore with MIOM: ```python -model = (miom - .load('@BiGG/Recon3D.miom', solver=miom.Solvers.GLPK) +import miom +import numpy as np + +# Use the flux-consistent subnetwork (fcm) of the Human1 GEM model +# NOTE: Fastcore requires that reactions in the core are flux consistent, +# otherwise the problem would be infeasible. +m = miom.load_gem('@homo_sapiens_human1_fcm.miom') +# Select reactions from the cholesterol metabolism as the core reactions to keep +core_rxn = m.find_reactions_from_pathway("Cholesterol metabolism") +# Assign a negative weight for reactions not in the core +weights = -1 * np.ones(m.num_reactions) +weights[core_rxn == 1] = 1 + +fmc = (miom + .load(m) + .setup(opt_tol=0.05, verbosity=1) .steady_state() - .set_rxn_objective('biomass_reaction') - .solve(verbosity=1)) + .subset_selection(weights) + .keep(core_rxn == 1) + .solve() + .select_subnetwork( + mode=miom.ExtractionMode.ABSOLUTE_FLUX_VALUE, + comparator=miom.Comparator.GREATER_OR_EQUAL, + value=1e-8 + ) + .network +) +print(fmc.num_reactions) ``` + + + + + ## Advantages * __It's easy to use:__ MIOM uses the [PICOS](https://picos-api.gitlab.io/picos/) and the [Python-MIP](https://www.python-mip.com/) libraries, which means you can use any solver supported by those libraries. diff --git a/examples/exact_fastcore_example.py b/examples/exact_fastcore_example.py index bab3938..bc5659e 100644 --- a/examples/exact_fastcore_example.py +++ b/examples/exact_fastcore_example.py @@ -2,7 +2,7 @@ import numpy as np # Load a model -m = miom.mio.load_gem('@homo_sapiens_human1_fcm.miom') +m = miom.load_gem('@homo_sapiens_human1_fcm.miom') print("Reactions in the consistent iHuman GEM:", m.num_reactions) core_rxn = m.find_reactions_from_pathway("Cholesterol metabolism") print("Num. of core reactions:", sum(core_rxn)) diff --git a/examples/fba_example.py b/examples/fba_example.py index bfa96f1..b760f24 100644 --- a/examples/fba_example.py +++ b/examples/fba_example.py @@ -1,7 +1,7 @@ import miom # Load a model -m = miom.mio.load_gem('https://github.com/pablormier/miom-gems/raw/main/gems/mus_musculus_iMM1865.miom') +m = miom.load_gem('@mus_musculus_iMM1865.miom') target = 'BIOMASS_reaction' opt_flux = (miom.load(network=m, solver=miom.Solvers.GLPK) diff --git a/examples/imat_example.py b/examples/imat_example.py index 48e0c77..528f893 100644 --- a/examples/imat_example.py +++ b/examples/imat_example.py @@ -1,6 +1,4 @@ -from miom import miom, Solvers -from miom.mio import load_gem -from miom.miom import Comparator, ExtractionMode +import miom # Example implementation of iMAT using MIOM. # Note that this implementation supports also custom weights for the reactions @@ -9,7 +7,7 @@ # function is exactly the same as the original iMAT. # Use the iHuman-GEM model -m = load_gem('https://github.com/pablormier/miom-gems/raw/main/gems/homo_sapiens_human1.miom') +m = miom.load_gem('@homo_sapiens_human1.miom') # Add all the reactions from the Cholesterol pathway to the highly expressed set RH = m.find_reactions_from_pathway("Cholesterol metabolism") @@ -18,14 +16,14 @@ w = RH + RL print("RH:", sum(RH), "RL:", sum(abs(RL))) -m = (miom(m, solver=Solvers.COIN_OR_CBC) +m = (miom(m, solver=miom.Solvers.COIN_OR_CBC) .setup(int_tol=1e-8, opt_tol=0.01, verbosity=1) .steady_state() .subset_selection(w) .solve(max_seconds=30) .select_subnetwork( - mode=ExtractionMode.ABSOLUTE_FLUX_VALUE, - comparator=Comparator.GREATER_OR_EQUAL, + mode=miom.ExtractionMode.ABSOLUTE_FLUX_VALUE, + comparator=miom.Comparator.GREATER_OR_EQUAL, value=1e-8 ) .network) diff --git a/examples/introduction_example.py b/examples/introduction_example.py index ca32be7..3d6e10c 100644 --- a/examples/introduction_example.py +++ b/examples/introduction_example.py @@ -1,13 +1,13 @@ -from miom import miom, Solvers -from miom.mio import load_gem +import miom # Load a genome-scale metabolic network. You can load SMBL or Matlab metabolic networks # as well using the same method, but it requires to have the cobratoolbox python library # installed. -network = load_gem("https://github.com/pablormier/miom-gems/raw/main/gems/mus_musculus_iMM1865.miom") +network = miom.load_gem("@mus_musculus_iMM1865.miom") target_rxn = "BIOMASS_reaction" # Create the optimization problem with miom and solve -model = (miom(network) +model = (miom + .load(network) .steady_state() .set_rxn_objective(target_rxn) .solve(verbosity=1)) diff --git a/examples/sparse_fba_example.py b/examples/sparse_fba_example.py index efcb30f..430f54a 100644 --- a/examples/sparse_fba_example.py +++ b/examples/sparse_fba_example.py @@ -1,7 +1,7 @@ import miom # Load a model -m = miom.mio.load_gem('@mus_musculus_iMM1865.miom') +m = miom.load_gem('@mus_musculus_iMM1865.miom') print("Num. of reactions in the network:", m.num_reactions) # Solve with Gurobi, CPLEX or CBC (other MIP solvers struggle with mediudm/large networks) diff --git a/miom/__init__.py b/miom/__init__.py index 205ea87..9048982 100644 --- a/miom/__init__.py +++ b/miom/__init__.py @@ -5,5 +5,7 @@ ExtractionMode ) -__all__ = ["load", "Solvers", "Comparator", "ExtractionMode"] -__version__ = "0.9.0-beta.2" \ No newline at end of file +from miom.mio import load_gem + +__all__ = ["load", "load_gem", "Solvers", "Comparator", "ExtractionMode"] +__version__ = "0.9.0-beta.3" \ No newline at end of file diff --git a/miom/__main__.py b/miom/__main__.py index 50003e0..f2281af 100644 --- a/miom/__main__.py +++ b/miom/__main__.py @@ -17,6 +17,7 @@ def convert_list_gems(input_files, output=None): m = miom.mio.load_gem(input_file) print(f"Loaded network with {m.num_reactions} reactions (in-memory size: {m.object_size:.2f} MB)") # Concatenate folder and output file + print(os.path.abspath(output)) if output is not None: if isinstance(output, list) and len(output) == len(input_files): output_file = output[i] @@ -29,7 +30,7 @@ def convert_list_gems(input_files, output=None): os.makedirs(abspath) except OSError as e: print(f"Cannot create {abspath} folder: {e}") - elif os.path.isdir(output): + elif os.path.isdir(output) or os.path.isdir(os.path.abspath(output)): output_file = os.path.join(output, output_file) else: raise ValueError("The provided output is not a folder or a list of destinations for each file") diff --git a/miom/mio.py b/miom/mio.py index 0848213..51751fe 100644 --- a/miom/mio.py +++ b/miom/mio.py @@ -149,9 +149,9 @@ def load_gem(model_or_path): if ext == '.miom' or ext == '.xz' or ext == '.npz': return _load_compressed_model(file) else: - return _cobra_to_miom(_read_cobra_model(file)) + return cobra_to_miom(_read_cobra_model(file)) else: - return _cobra_to_miom(model_or_path) + return cobra_to_miom(model_or_path) def _read_cobra_model(filepath): @@ -191,7 +191,7 @@ def export_gem(miom_network, path_to_exported_file): f_out.write(compressed) -def _cobra_to_miom(model): +def cobra_to_miom(model): try: from cobra.util.array import create_stoichiometric_matrix except ImportError as e: @@ -219,11 +219,12 @@ def _cobra_to_miom(model): subsystems.append(rxn.subsystem.tolist()) else: subsystems.append(rxn.subsystem) - rxn_data = [(rxn.id, rxn.lower_bound, rxn.upper_bound, subsystem, rxn.gene_reaction_rule) + rxn_data = [(rxn.id, rxn.name, rxn.lower_bound, rxn.upper_bound, subsystem, rxn.gene_reaction_rule) for rxn, subsystem in zip(model.reactions, subsystems)] met_data = [(met.id, met.name, met.formula) for met in model.metabolites] R = np.array(rxn_data, dtype=[ ('id', 'object'), + ('name', 'object'), ('lb', 'float'), ('ub', 'float'), ('subsystem', 'object'), diff --git a/miom/miom.py b/miom/miom.py index 4abf983..be66c80 100644 --- a/miom/miom.py +++ b/miom/miom.py @@ -304,10 +304,10 @@ def indicator_values(self): return None -def _partial(fn): +def _composable(fn): """Annotation for methods that return the instance itself to enable chaining. - If a method `my_method` is annotated with @_partial, a method called `_my_method` + If a method `my_method` is annotated with @_composable, a method called `_my_method` is expected to be provided by a subclass. Parent method `my_method` is called first and the result is passed to the child method `_my_method`. @@ -326,7 +326,7 @@ def wrapper(self, *args, **kwargs): # Find subclass implementation fname = '_' + fn.__name__ if not hasattr(self, fname): - raise ValueError(f'Method "{fn.__name__}()" is marked as @_partial ' + raise ValueError(f'Method "{fn.__name__}()" is marked as @_composable ' f'but the expected implementation "{fname}()" was not provided ' f'by {self.__class__.__name__}') func = getattr(self, fname) @@ -403,7 +403,7 @@ def initialize_problem(self): def get_solver_status(self): pass - @_partial + @_composable def setup(self, **kwargs): """Provide the options for the solver. @@ -444,7 +444,7 @@ def setup(self, **kwargs): self._options["_min_eps"] = np.sqrt(2*self._options["int_tol"]) return self._options - @_partial + @_composable def steady_state(self): """Add the required constraints for finding steady-state fluxes @@ -456,7 +456,7 @@ def steady_state(self): """ pass - @_partial + @_composable def keep(self, reactions): """Force the inclusion of a list of reactions in the solution. @@ -501,7 +501,7 @@ def keep(self, reactions): if r.index in valid_rxn_idx] return dict(idxs=idxs, valid_rxn_idx=valid_rxn_idx) - @_partial + @_composable def subset_selection(self, rxn_weights, direction='max', eps=1e-2): """Transform the current model into a subset selection problem. @@ -561,7 +561,7 @@ def subset_selection(self, rxn_weights, direction='max', eps=1e-2): warnings.warn("Indicator variables were already assigned") return False - @_partial + @_composable def set_flux_bounds(self, rxn_id, min_flux=None, max_flux=None): """Change the flux bounds of a reaction. @@ -576,7 +576,7 @@ def set_flux_bounds(self, rxn_id, min_flux=None, max_flux=None): i, _ = self.network.find_reaction(rxn_id) return i - @_partial + @_composable def add_constraints(self, constraints): """Add a list of constraint to the model @@ -591,7 +591,7 @@ def add_constraints(self, constraints): self.add_constraint(c) return len(constraints) > 0 - @_partial + @_composable def add_constraint(self, constraint): """Add a specific constraint to the model. The constraint should use existing variables already included in the model. @@ -601,7 +601,7 @@ def add_constraint(self, constraint): """ pass - @_partial + @_composable def set_objective(self, cost_vector, variables, direction='max'): """Set the optmization objective of the model. @@ -660,7 +660,7 @@ def set_fluxes_for(self, reactions, tolerance=1e-6): self.add_constraint(self.variables.fluxvars[i] <= ub) return self - @_partial + @_composable def reset(self): """Resets the original problem (removes all modifications) @@ -730,7 +730,7 @@ def obtain_subnetwork( S_sub = S_sub[act_met, :] return MiomNetwork(S_sub, R_sub, M_sub) - @_partial + @_composable def select_subnetwork( self, mode=ExtractionMode.ABSOLUTE_FLUX_VALUE, @@ -792,7 +792,7 @@ def get_fluxes(self, reactions=None): else: raise ValueError("reactions should be an iterable of strings or a single string") - @_partial + @_composable def solve(self, verbosity=None, max_seconds=None): """Solve the current model and assign the values to the variables of the model. @@ -803,7 +803,7 @@ def solve(self, verbosity=None, max_seconds=None): """ self._last_start_time = perf_counter() - @_partial + @_composable def copy(self): pass @@ -912,7 +912,13 @@ def _select_subnetwork(self, **kwargs): return PicosModel(previous_step_model=self, miom_network=m) def _copy(self, **kwargs): - return PicosModel(previous_step_model=self) + m = PicosModel(previous_step_model=self) + # Create a copy of the internal problem + m.problem = self.problem.copy() + # TODO: Assign the new variables + m.variables._indicator_vars = None + m.variables._flux_vars = None + raise NotImplementedError("Will be added in future versions") def get_solver_status(self): return { @@ -1051,7 +1057,10 @@ def _select_subnetwork(self, **kwargs): return PythonMipModel(previous_step_model=self, miom_network=kwargs["_parent_result"]) def _copy(self, **kwargs): - return PythonMipModel(previous_step_model=self) + m = PythonMipModel(previous_step_model=self) + m.problem = self.problem.copy() + raise NotImplementedError("Will be added in future versions") + def get_solver_status(self): #solver_status['elapsed_seconds'] = self.problem.search_progress_log.log[-1:][0][0] diff --git a/mkdocs.yml b/mkdocs.yml index 3cf3e3d..e08c85a 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -32,6 +32,8 @@ markdown_extensions: - pymdownx.highlight - pymdownx.superfences - pymdownx.details + - pymdownx.tabbed + - pymdownx.inlinehilite - pymdownx.arithmatex: generic: true extra: diff --git a/pyproject.toml b/pyproject.toml index f4c39ab..f417873 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [tool.poetry] name = "miom" -version = "0.9.0-beta.2" +version = "0.9.0-beta.3" description = "Mixed Integer Optimization for Metabolism" authors = ["Pablo R. Mier "] keywords = ["optimization", "LP", "MIP", "metabolism", "metabolic-networks"] diff --git a/tests/test_miom.py b/tests/test_miom.py index 1eee3e7..dd04bf1 100644 --- a/tests/test_miom.py +++ b/tests/test_miom.py @@ -173,8 +173,5 @@ def test_keep_rxn(model): ) assert abs(V[i]) > 1e-8 -def test_copy(model): - c = model.copy() - assert model != c and isinstance(c, BaseModel) \ No newline at end of file