-
Notifications
You must be signed in to change notification settings - Fork 1
5. Evaluation
leADS can be evaluated using a pre-trained model (see Training). A pre-trained model, ("leADS.pkl") trained on Enzyme Commission (EC) number indices with embedding (biocyc21_Xe.pkl) and the pathway indices (biocyc21_y.pkl) data is made available to users in the Download files section of this wiki.
Note: Make sure to put the source code leADS/
(Installing leADS) into the leADS_materials/
directory as explained in the Download files section. Additionally, create a log/
and result/
(if you have not already created one during pathway prediction) folder in the same leADS_materials/
directory. The final structure should look like this:
leADS_materials/
├── objectset/
│ └── ...
├── model/
│ └── ...
├── dataset/
│ └── ...
├── result/
│ └── ...
└── leADS/
└── ...
For all experiments, using a terminal
(On Linux and macOS) or an Anaconda command prompt
(On Windows) navigate to the src/
folder in the leADS/
directory and then run the commands as shown in the Examples section.
To display leADS' running options use: python main.py --help
. It should be self-contained.
Two matrix files namely [DATANAME]_X*.pkl and the [DATANAME]_y.pkl must be provided for evaluation of a leADS model.
Note: Data files such as "[DATANAME]_Xe.pkl", "[DATANAME]_Xa.pkl", "[DATANAME]_X.pkl" can be used for evaluation, provided leADS was trained using these corresponding files.
The basic command is represented below. Do not use this to run the evaluation step. This command is only a representation of all the flags used. See the Examples section below on how to run Evaluation.
python main.py \
--evaluate \
--pred-labels \
--soft-voting \
--X-name "[DATANAME]_X*.pkl" \
--y-name "[DATANAME]_y.pkl" \
--file-name "[save file name]" \
--dspath "[absolute path to the dataset directory (e.g. dataset)]" \
--rspath "[absolute path to the result directory (e.g. result)]" \
--batch 50 \
--num-jobs 2
The table below summarizes all the command-line arguments that are specific to this framework:
Argument name | Description | Value |
---|---|---|
--evaluate | To evaluate the performance of leADS on the input dataset | False |
--pred-labels | Predicting labels in input | False |
--soft-voting | Boolean variable indicating whether to predict labels based on the calibrated sums of the predicted probabilities from an ensemble | False |
--X-name | The input file name to be provided for evaluation | [DATANAME]_Xe.pkl |
--y-name | The input file name to be provided for evaluation | [DATANAME]_y.pkl |
--file-name | The names of input preprocessed files (without extension) | [input (or save) file name] |
--dspath | The path to the datasets | Outside source code |
--rspath | The path to store results | Outside source code |
--batch | Batch size | 50 |
--num-jobs | The number of parallel workers | 2 |
The output file generated after running the command is:
File | Description |
---|---|
[DATANAME]_scores.txt | A text file containing model performance scores for all samples used |
To evaluate the performance of leADS on the golden dataset (golden_Xe.pkl and golden_y.pkl), run the following command:
Note: The flag --dsname
must include the name of the dataset which is "golden" in this case.
python main.py --evaluate --pred-labels --soft-voting --X-name "golden_Xe.pkl" --y-name "golden_y.pkl" --dsname "golden" --file-name "leADS_golden" --model-name "leADS" --num-jobs 2
After running the command, the output will be saved to the result/
folder. A short description of the output is given in the table above. The tree structure for the folder with the output will look like this:
leADS_materials/
├── objectset/
│ └── ...
├── model/
│ ├── leADS.pkl
│ └── ...
├── dataset/
│ └── ...
├── result/
| ├── leADS_golden_scores.txt
│ └── ...
└── leADS/
└── ...
To evaluate the performance of leADS on the cami dataset (cami_Xe.pkl and cami_y.pkl), run the following command:
Note: The flag --dsname
must include the name of the dataset which is "cami" in this case.
python main.py --evaluate --pred-labels --soft-voting --X-name "cami_Xe.pkl" --y-name "cami_y.pkl" --dsname "cami" --file-name "leADS_cami" --model-name "leADS" --num-jobs 2
After running the command, the output will be saved to the result/
folder. A short description of the output is given in the table above. The tree structure for the folder with the output will look like this:
leADS_materials/
├── objectset/
│ └── ...
├── model/
│ ├── leADS.pkl
│ └── ...
├── dataset/
│ └── ...
├── result/
| ├── leADS_cami_scores.txt
│ └── ...
└── leADS/
└── ...