5. Evaluation

Overview

leADS can be evaluated using a pre-trained model (see Training). A pre-trained model ("leADS.pkl") trained on Enzyme Commission (EC) number indices with embedding (biocyc21_Xe.pkl) and the pathway indices (biocyc21_y.pkl) data is made available to users in the Download files section of this wiki.

Note: Make sure to put the source code leADS (Installing leADS) into the leADS_materials/ directory as explained in the Download files section. Additionally, create a log/ and result/ (if you have not already created one during pathway prediction) folder in the same leADS_materials/ directory. The final structure should look like this:

leADS_materials/
	├── objectset/
        │       └── ...
	├── model/
        │       └── ...
	├── dataset/
        │       └── ...
	├── result/
        │       └── ...
	└── leADS/
                └── ...

For all experiments, using a terminal (On Linux and macOS) or an Anaconda command prompt (On Windows) navigate to the src folder in the leADS directory and then run the commands as shown in the Examples section.

To display leADS' running options use: python main.py --help. It should be self-contained.

Input:

Two matrix files namely [DATANAME]_X*.pkl and the [DATANAME]_y.pkl must be provided for evaluation of leADS.

Note: Data files such as "[DATANAME]_Xe.pkl", "[DATANAME]_Xa.pkl", "[DATANAME]_X.pkl" can be used for evaluation, provided that leADS was trained using these corresponding files.

Command:

The basic command is represented below. Do not use this to run the evaluation step. This command is only a representation of all the flags used. See the Examples section below on how to run Evaluation.

--evaluate \
--pred-labels \
--soft-voting \
--X-name "[DATANAME]_X*.pkl" \
--y-name "[DATANAME]_y.pkl" \
--file-name "[save file name]" \
--dspath "[absolute path to the dataset directory (e.g. dataset)]" \
--rspath "[absolute path to the result directory (e.g. result)]" \
--batch 50 \
--num-jobs 2

Argument descriptions:

The table below summarizes all the command-line arguments that are specific to this framework:

Argument name	Description	Value
--evaluate	To evaluate the performance of leADS on the input dataset	False
--pred-labels	Predicting labels in input	False
--soft-voting	Boolean variable indicating whether to predict labels based on the calibrated sums of the predicted probabilities from an ensemble	False
--X-name	The Input file name to be provided for evaluation	[DATANAME]_Xe.pkl
--y-name	The Input file name to be provided for evaluation	[DATANAME]_y.pkl
--file-name	The names of input preprocessed files (without extension)	[input (or save) file name]
--dspath	The path to the datasets	Outside source code
--rspath	The path to store results	Outside source code
--batch	Batch size	50
--num-jobs	The number of parallel workers	2

Output:

The output file generated after running the command is:

File	Description
[DATANAME]_scores.txt	A text file containing model performance scores for all samples used

Examples

Example 1:

To evaluate the performance of leADS on the golden dataset (golden_Xe.pkl and golden_y.pkl), run the following command:

Note: The flag --dsname must include the name of the dataset which is "golden" in this case.

python main.py --evaluate --pred-labels --soft-voting --X-name "golden_Xe.pkl" --y-name "golden_y.pkl" --dsname "golden" --file-name "leADS_golden" --model-name "leADS" --num-jobs 2

After running the command, the output will be saved to the result/ folder. A short description of the output is given in the table above. The tree structure for the folder with the output will look like this:

leADS_materials/
	├── objectset/
        │       └── ...
	├── model/
        │       ├── leADS.pkl
        │       └── ...
	├── dataset/
        │       └── ...
	├── result/
        |       ├── leADS_golden_scores.txt
        │       └── ...
	└── leADS/
                └── ...

Example 2:

To evaluate the performance of leADS on the cami dataset (cami_Xe.pkl and cami_y.pkl), run the following command:

Note: The flag --dsname must include the name of the dataset which is "cami" in this case.

python main.py --evaluate --pred-labels --soft-voting --X-name "cami_Xe.pkl" --y-name "cami_y.pkl" --dsname "cami" --file-name "leADS_cami" --model-name "leADS" --num-jobs 2