5. Evaluation

Overview

leADS can be evaluated using a pre-trained model (see Training). A pre-trained model, ("leADS.pkl") trained on Enzyme Commission (EC) number indices with embedding (biocyc21_Xe.pkl) and the pathway indices (biocyc21_y.pkl) data is made available to users in the Download files section of this wiki.

Note: Make sure to put the source code leADS/ (Installing leADS) into the leADS_materials/ directory as explained in the Download files section. Additionally, create a log/ and result/ (if you have not already created one during pathway prediction) folder in the same leADS_materials/ directory. The final structure should look like this:

leADS_materials/
	├── objectset/
        │       └── ...
	├── model/
        │       └── ...
	├── dataset/
        │       └── ...
	├── result/
        │       └── ...
	└── leADS/
                └── ...

For all experiments, using a terminal (On Linux and macOS) or an Anaconda command prompt (On Windows) navigate to the src/ folder in the leADS/ directory and then run the commands as shown in the Examples section.

To display leADS' running options use: python main.py --help. It should be self-contained.

Input:

Two matrix files namely [DATANAME]_X*.pkl and the [DATANAME]_y.pkl must be provided for evaluation of a leADS model.

Note: Data files such as "[DATANAME]_Xe.pkl", "[DATANAME]_Xa.pkl", "[DATANAME]_X.pkl" can be used for evaluation, provided leADS was trained using these corresponding files.

Command:

The basic command is represented below. Do not use this to run the evaluation step. This command is only a representation of all the flags used. See the Examples section below on how to run Evaluation.

python main.py \
--evaluate \
--pred-labels \
--soft-voting \
--X-name "[DATANAME]_X*.pkl" \
--y-name "[DATANAME]_y.pkl" \
--file-name "[save file name]" \
--dspath "[absolute path to the dataset directory (e.g. dataset)]" \
--rspath "[absolute path to the result directory (e.g. result)]" \
--batch 50 \
--num-jobs 2

Argument descriptions:

The table below summarizes all the command-line arguments that are specific to this framework:

Argument name	Description	Value
--evaluate	To evaluate the performance of leADS on the input dataset	False
--pred-labels	Predicting labels in input	False
--soft-voting	Boolean variable indicating whether to predict labels based on the calibrated sums of the predicted probabilities from an ensemble	False
--X-name	The input file name to be provided for evaluation	[DATANAME]_Xe.pkl
--y-name	The input file name to be provided for evaluation	[DATANAME]_y.pkl
--file-name	The names of input preprocessed files (without extension)	[input (or save) file name]
--dspath	The path to the datasets	Outside source code
--rspath	The path to store results	Outside source code
--batch	Batch size	50
--num-jobs	The number of parallel workers	2

Output:

The output file generated after running the command is:

File	Description
[DATANAME]_scores.txt	A text file containing model performance scores for all samples used

Examples

Example 1:

To evaluate the performance of leADS on the golden dataset (golden_Xe.pkl and golden_y.pkl), run the following command:

Note: The flag --dsname must include the name of the dataset which is "golden" in this case.

python main.py --evaluate --pred-labels --soft-voting --X-name "golden_Xe.pkl" --y-name "golden_y.pkl" --dsname "golden" --file-name "leADS_golden" --model-name "leADS" --num-jobs 2

After running the command, the output will be saved to the result/ folder. A short description of the output is given in the table above. The tree structure for the folder with the output will look like this:

leADS_materials/
	├── objectset/
        │       └── ...
	├── model/
        │       ├── leADS.pkl
        │       └── ...
	├── dataset/
        │       └── ...
	├── result/
        |       ├── leADS_golden_scores.txt
        │       └── ...
	└── leADS/
                └── ...

Example 2:

To evaluate the performance of leADS on the cami dataset (cami_Xe.pkl and cami_y.pkl), run the following command:

Note: The flag --dsname must include the name of the dataset which is "cami" in this case.

python main.py --evaluate --pred-labels --soft-voting --X-name "cami_Xe.pkl" --y-name "cami_y.pkl" --dsname "cami" --file-name "leADS_cami" --model-name "leADS" --num-jobs 2