High-level tools to copy an entire tracking server or a collection of MLflow objects (runs, experiments and registered models). Full object referential integrity is maintained as well as the original MLflow object names.
Three types of bulk tools:
- All - all MLflow objects of the tracking server.
- Registered models - models and their versions' run and the run's experiment.
- Experiments.
Notes:
- Original source model and experiment names are preserved.
- Leverages the Single tools as basic building blocks.
MLflow Object | Documentation | Code | Description |
---|---|---|---|
All | export-all | code | Exports all MLflow objects (registered models, experiments and runs) to a directory. |
import-all | Uses import-models | Imports MLflow objects from a directory. | |
Model | export-models | code | Exports several (or all) registered models and their versions' backing run along with the run's experiment to a directory. |
import-models | code | Imports registered models from a directory. | |
Experiment | export-experiments | code | Export several (or all) experiments to a directory. |
import-experiments | code | Imports experiments from a directory. |
Exports all MLflow objects of the tracking server (Databricks workspace) - all models, experiments and runs. If you are exporting from Databricks, the notebook can be exported in several different formats.
Source: export_all.py.
export-all --help
Options:
--output-dir TEXT Output directory. [required]
--export-source-tags BOOLEAN Export source run information (RunInfo, MLflow
system tags starting with 'mlflow' and
metadata) under the 'mlflow_export_import' tag
prefix. See README.md for more details.
[default: False]
--notebook-formats TEXT Databricks notebook formats. Values are
SOURCE, HTML, JUPYTER or DBC (comma
seperated).
--use-threads BOOLEAN Process export/import in parallel using
threads. [default: False]
export-all --output-dir out
import-all
imports all exported MLflow objects.
Since the exported output directory is the same structure for both export-all
and export-models
, this script calls import-models.
import-all --input-dir out
Tools that copy registered models and their versions' runs along with the runs' experiment.
When exporting a registered models the following model's associated objects are exported:
- All the latest versions of a model.
- The run associated with each version.
- The experiment that the run belongs to.
Scripts
export-models
- exports registered models and their versions' backing run along with the experiment that the run belongs to.import-models
- imports models and their versions' runs and experiments from the above exported directory.
Output directory structure of models export
+-manifest.json
|
+-experiments/
| +-manifest.json
| +-1/
| | +-manifest.json
| | +-5bd3b8a44faf4803989544af5cb4d66e/
| | | +-run.json
| | | +-artifacts/
| | | | +-sklearn-model/
| | +-4273c31c45744ec385f3654c63c31360
| | | +-run.json
| | | +-artifacts/
| | | . . .
|
+-models/
| +-manifest.json
| +-sklearn_iris/
| | +-model.json
| +-4273c31c45744ec385f3654c63c31360/ | | +-run.json
For further directory structure see the single
tool sections for experiments and models further below.
Exports registered models and their versions' backing run along with the run's experiment.
The export-all-runs
option is of particular significance.
It controls whether all runs of an experiment are exported or only those associated with a registered model version.
Obviously there are many runs that are not linked to a registered model version.
This can make a substantial difference in export time.
Source: export_models.py.
export-models --help
Options:
--output-dir TEXT Output directory. [required]
--models TEXT Registered model names (comma delimited).
For example, 'model1,model2'. 'all' will
export all models. [required]
--export-source-tags BOOLEAN Export source run information (RunInfo, MLflow
system tags starting with 'mlflow' and
metadata) under the 'mlflow_export_import' tag
prefix. See README_single.md for more
details. [default: False]
--notebook-formats TEXT Databricks notebook formats. Values are
SOURCE, HTML, JUPYTER or DBC (comma
seperated).
--stages TEXT Stages to export (comma seperated). Default is
all stages. Values are Production, Staging,
Archived and None.
--export-all-runs BOOLEAN Export all runs of experiment or just runs
associated with registered model versions.
[default: False]
--use-threads BOOLEAN Process export/import in parallel using
threads. [default: False]
export-models --output-dir out
export-models \
--output-dir out \
--models sklearn-wine,sklearn-iris
export-models \
--output-dir out \
--models sklearn*
Source: import_models.py.
import-models --help
Options:
--input-dir TEXT Input directory. [required]
--delete-model BOOLEAN First delete the model if it exists and all
its versions. [default: False]
--verbose BOOLEAN Verbose. [default: False]
--use-src-user-id BOOLEAN Set the destination user ID to the source
user ID. Source user ID is ignored when
importing into Databricks since setting it
is not allowed. [default: False]
--use-threads BOOLEAN Process the export/import in parallel using
threads. [default: False]
import-models --input-dir out
Export/import experiments to a directory.
Output directory structure of models export
+-manifest.json
+-manifest.json
| +-5bd3b8a44faf4803989544af5cb4d66e/
| | +-run.json
| | +-artifacts/
| | | +-sklearn-model/
| +-4273c31c45744ec385f3654c63c31360/
| | +-run.json
| | +-artifacts/
| | +- . . .
Export several (or all) experiments to a directory.
export-experiments --help
Options:
--experiments TEXT Experiment names or IDs (comma delimited).
For example, 'sklearn_wine,sklearn_iris' or '1,2'.
'all' will export all experiments. [required]
--output-dir TEXT Output directory. [required]
--notebook-formats TEXT Databricks notebook formats. Values are SOURCE,
HTML, JUPYTER or DBC (comma seperated).
--use-threads BOOLEAN Process export/import in parallel using threads.
[default: False]
Export experiments by experiment ID.
export-experiments \
--experiments 2,3 --output-dir out
Export experiments by experiment name.
export-experiments \
--experiments sklearn,sparkml --output-dir out
Export all experiments.
export-experiments \
--experiments all --output-dir out
Exporting experiment 'Default' (ID 0) to 'out/0'
Exporting experiment 'sklearn' (ID 1) to 'out/1'
Exporting experiment 'keras_mnist' (ID 2) to 'out/2'
. . .
249 experiments exported
1770/1770 runs succesfully exported
Duration: 1.6 seonds
The root output directory contains an experiments.json file and a subdirectory for each experiment (named for the experiment ID).
Each experiment subdirectory in turn contains its own experiment.json file and a subdirectory for each run. The run directory contains a run.json file containing run metadata and artifact directories.
In the example below we have two experiments - 1 and 7. Experiment 1 (sklearn) has two runs (f4eaa7ddbb7c41148fe03c530d9b486f and 5f80bb7cd0fc40038e0e17abe22b304c) whereas experiment 7 (sparkml) has one run (ffb7f72a8dfb46edb4b11aed21de444b).
+-experiments.json
+-1/
| +-experiment.json
| +-f4eaa7ddbb7c41148fe03c530d9b486f/
| | +-run.json
| | +-artifacts/
| | +-sklearn-model/
| | +-onnx-model/
| +-5f80bb7cd0fc40038e0e17abe22b304c/
| | +-run.json
| +-artifacts/
| +-sklearn-model/
| +-onnx-model/
+-7/
| +-experiment.json
| +-ffb7f72a8dfb46edb4b11aed21de444b/
| | +-run.json
| +-artifacts/
| +-spark-model/
| +-mleap-model/
Sample experiments.json
{
"system": {
"package_version": "1.1.2",
"script": "export_experiments.py"
},
"info": {
"duration": 0.2,
"experiments": 3,
"total_runs": 2,
"ok_runs": 2,
"failed_runs": 0
},
"mlflow": {
"experiments": [
{
"id": "2",
"name": "sklearn",
"ok_runs": 1,
"failed_runs": 0,
"duration": 0.1
},
{
"id": "2",
"name": "sparkml",
"ok_runs": 1,
"failed_runs": 0,
"duration": 0.1
},
Sample experiment.json
{
"system": {
"package_version": "1.1.2",
}
"info": {
"num_total_runs": 1,
"num_ok_runs": 1,
"num_failed_runs": 0,
"failed_runs": []
},
"mlflow": {
"experiment": {
"experiment_id": "1",
"name": "sklearn_wine",
"artifact_location": "/Users/andre.mesarovic/work/mlflow_server/local_mlrun/mlruns/1",
"lifecycle_stage": "active",
"tags": {
"experiment_created": "2022-12-15 02:17:43",
"version_mlflow": "2.0.1"
},
"creation_time": 1671070664091,
"last_update_time": 1671070664091
}
"runs": [
"a83cebbccbca41299360c695c5ea72f3"
],
}
Sample experiment.json
{
"experiment": {
"experiment_id": "1",
"name": "sklearn",
"artifact_location": "/opt/mlflow/server/mlruns/1",
"lifecycle_stage": "active"
},
"export_info": {
"export_time": "2022-01-14 03:26:42",
"num_total_runs": 2,
"num_ok_runs": 2,
"ok_runs": [
"4445f19b7bf04d0fb0173424db476198",
"d835e17257ad4d6db92441ad93bec549"
],
"num_failed_runs": 0,
"failed_runs": []
}
}
Import experiments from a directory. Reads the manifest file to import expirements and their runs.
The experiment will be created if it does not exist in the destination tracking server. If the experiment already exists, the source runs will be added to it.
import-experiments --help
Options:
--input-dir TEXT Input path - directory [required]
--use-src-user-id BOOLEAN Set the destination user ID to the source user
ID. Source user ID is ignored when importing into
Databricks since setting it is not allowed.
--use-threads BOOLEAN Process export/import in parallel using threads.
[default: False]
import-experiments --input-dir out
import-experiments \
--input-dir out