Scripts that run against Watson Assistant for
KFOLD
K fold cross validation on training set,BLIND
Evaluating a blind test, andTEST
Testing the WA against a list of utterances.
In the case of a k-fold cross validation, or a blind set, the tool will output a precision curve, in addition to per-intent precision and recall rates, and a confusion matrix.
- Easy to setup in one configuration file.
- Save the state when Assistant service is down in the middle of processing.
- Able to resume from where it stops using modularized scripts.
- Python 3.6.4 +
- Mac users: you may need to initialize Python's SSL certificate store by running
Install Certificates.command
found in/Applications/Python
. See more here - Git client
Pre-work: Make sure to cd into the location of a projects folder, where you will clone this github repo. Within the folder, cd into the WA-Testing-Tool folder.
- Install code
git clone https://github.com/cognitive-catalyst/WA-Testing-Tool.git
- Install dependencies
pip3 install --upgrade -r requirements.txt
- Set up parameters properly in configuration file (ex:
config.ini
). Useconfig.ini.sample
to bootstrap your configuration. a. In your terminal, copy the config file into a new one,cp config.ini.sample config.ini
b. Open the config.ini file in your favorite text editor, edit and save the following information with your actual credentials: API Key url workspace_id (Watson Assistant v1) or environment_id (Watson Assistant v2) c. Set the mode and the mode-specific parameters. - Run the process.
python3 run.py -c config.ini
orpython3 run.py -c <path to your config file>
If you have already installed this utility use these steps to get the latest code.
- Upgrade dependencies
pip3 install --upgrade -r requirements.txt
- Update to latest code level
git pull
config.ini
- Configuration file for run.py
.
This is formatted differently for each mode. Review the Examples below to explore the possible modes and how each is configured.
test_input_file.csv
- Test set for blind testing and standard test.
For blind test with golden intent used for comparison:
utterance | golden intent |
---|---|
utterance 0 | intent 0 |
utterance 1 | intent 0 |
utterance 2 | intent 1 |
For standard test, the input must only have one column or error will be thrown:
utterance |
---|
utterance 0 |
utterance 1 |
utterance 2 |
There are a variety of ways to use this tool. Primarily you will execute a k-folds, blind, or standard test.
Run standard test without ground truth
Generate precision/recall for classification test
Generate confusion matrix for classification test
Compare two different blind test results
Generate description for intents
Generate long-tail classification results
Run syntax validation patterns on a workspace
Extract and analyze Watson Assistant log data
Long-form resources available in Article and Video form:
Title | Article | Video |
---|---|---|
Testing a Chatbot with k-folds Cross Validation | https://medium.com/ibm-watson/testing-a-chatbot-with-k-folds-cross-validation-68dab111a6b | https://www.youtube.com/watch?v=FrhK68WyOK4 |
Analyze chatbot classifier performance from logs | https://medium.com/ibm-watson/analyze-chatbot-classifier-performance-from-logs-e9cf2c7ca8fd | https://www.youtube.com/watch?v=yd89DKyf6hc |
Improve a chatbot classifier with production data | https://medium.com/ibm-watson/improve-a-chatbot-classifier-with-production-data-22a437f419b4 | https://www.youtube.com/watch?v=ftFIQtHiQY8 |
Watson Assistant is commonly paired with IBM Speech services to build voice-driven Conversational AI solutions. Check out these tools to assess and tune your speech models!
- STT-WER-Python: Utilities for testing IBM Speech to Text
- TTS-Python: Utilities for testing IBM Text to Speech
This tool can also be used to test a trained Natural Language Understanding (NLU) Classifier. The configuration is similar to testing Watson Assistant except:
- Use the NLU URL in the
url
parameter (ex:https://api.us-south.natural-language-understanding.watson.cloud.ibm.com
) - Specify the
<model_id>
in theworkspace_id
parameter in the configuration - Since NLU classifier does not support downloading training data, the original training data must be provided if run in 'kfold' mode (using the
train_input_file
parameter)
-
Due to different coverage among service plans, user may need to adjust
max_test_rate
accordingly to avoid network connection error. -
Users on Lite plans are only able to create 5 workspaces. They should set
fold_num=3
on their k-fold configuration file. -
In case of interrupted execution, the tool may not be able to clean up the workspaces it creates. In this case you will need to manually delete the extra workspaces.
-
Workspace ID is not the Skill ID. In the Watson Assistant user interface, the Workspace ID can be found on the Skills tab, clicking the three dots (top-right of skill), and choosing View API Details.
-
SSL: [CERTIFICATE_VERIFY_FAILED] on Mac means you may need to initialize Python's SSL certificate store by running
Install Certificates.command
found in/Applications/Python
. See more here -
"This utility used to work and now it doesn't." Upgrade to latest dependencies with
pip3 install --upgrade -r requirements.txt
and latest code withgit pull
. -
If you get a Python module loading error, confirm that you are using matching pip and python version, ie
pip3
andpython3
orpip
andpython
. -
Watson Assistant v2 configuration does not support k-folds mode. Watson Assistant v2 is tested "in-place" rather than creating temporary skills for this tool. Actions users may prefer to use Dialog Skill Analysis notebooks - these notebooks have additional capabilities for analyzing Dialog or Action skills.