-
Notifications
You must be signed in to change notification settings - Fork 234
Testing Standards
All models and tools within the IDAES code base are expected to have accompanying tests, which check for both coding errors and final results.
All IDAES tests are written using pytest
(pytest documentation). pytest
automatically identifies test modules, classes and methods, thus the following guidelines should be followed when creating tests:
- Test modules should be contained in a folder named
tests
, generally as a sub-folder of the code being tested. - Test files must start with
test_
, and conversely non-test files should avoid havingtest
in their names. - Test files must contain one or more methods which start with
test_
- Tests can also be organized into
classes
, which must start withTest
.
IDAES test are divided into three categories, which are used for organizing tests into different levels of rigor and complexity (and thus execution time). Lower level tests are are expected to run frequently, and thus need to keep execution time to a minimum to avoid delays, whilst higher level tests are run less frequently and can thus take longer to complete. The three categories used by IDAES are:
-
unit
: Unit tests are used to test basic functionality and execution of individual parts of the code. Within IDAES, these tests are generally used to test model construction, but not model solutions. These tests should be fast to run (less than 2 seconds), and should not require the use of a solver. Unit tests should ideally cover all lines of code within a model with the exception of those requiring a solver to complete (initialization and final solves). -
component
: Component tests are used to test model solutions for single example cases in order to check that a model can be solved. These test obviously require the use of a solver, but should still be relatively quick to execute (ideally less than 10 seconds). -
integration
: The final level of tests are integration tests, which are used for longer duration verification and validation tests. These tests are used to confirm model accuracy and robustness over a wide range of conditions, and as such can take longer to execute.integration
tests are also used to execute all examples in the IDAES Examples to ensure that any changes to the core codebase do not break the examples.
As a general rule, any tool or module should have a set of unit
and component
tests that exercise and solve all possible options/combinations for a single (generally simple) test case to confirm that the code works as expected and can be solved. Each model or tool should also have a more extensive suite of integration
tests which test and verify the accuracy and robustness of the model/tool over as wide a range of conditions as possible.
Developers should run the unit
and component
tests at a minimum before creating a Pull Request in order to avoid test failures on the PR. A PR must pass all tests (including integration
tests) before it will be merged, thus it is best to identify these early. It is also a good idea to run the integration
tests unless you are certain your changes will not affect the examples. The IDAES tests can be run from the command line using the following command:
>>> pytest
Pytest can be used to run tests in a specific path or file using:
>>> pytest PATH
Pytest can also be directed to run tests with specific marks:
>>> pytest -m unit # Only run tests with the "unit" mark
>>> pytest -m "not integration" # Run all tests in PATH that do not have the "integration" mark
IDAES Also used automated testing via GitHub Actions to run the test suite on all Pull Requests to the main repository, as well as to run regularly scheduled tests of the current code.
A starting point for determining quality of testing in a repository is to look at the percentage of the lines of code in the project that are executed during test runs, and which lines are missed. A number of automated tools are available to do this, and IDAES uses CodeCov for this purpose, which is integrated in to the Continuous Integration environment and reported for all Pull Requests. CodeCov provides a number of different reports which can be used to examine test coverage across the project and identify areas of low coverage for improvement.
80% test coverage is generally accepted as a good goal to aim for, and IDAES aims to meet this as a minimum standard for test coverage. In order to meet this target, the following requirements are placed on Pull Requests to ensure test coverage is constantly increasing:
- A Pull Request may not cause the overall test coverage to decrease, and
- The test coverage on the changed code (diff coverage) must be equal to or greater than that of the overall test coverage.
However, there are limitations to relying on test coverage as the sole metric for testing, as coverage only checks to see which lines of code were executed during testing, and cannot does not consider quality of the tests. Further, it is often necessary to test the same line of code multiple times for different user inputs (.e.g. testing for cases where a user passes a string
when a float
was expected) which is not easily quantified. This is especially true for process models, where the same model needs to be tested under a wide range of conditions to be accepted as validated. Thus, code coverage should only be considered as an initial, lower bound on testing.
Unit tests are the hardest tests to write, as they primarily focus on testing aspects of the current model or tool only - that is they do not test integration with other models or tools. This is especially challenging for testing of Unit Models, as they inherently depend upon property packages and control volumes to provide critical information and infrastructure. To help with this, IDAES provides some testing utilities (Testing Utility Functions) that include minimal property and reaction packages for testing purposes, which allows testing to focus on the Unit Model alone.
Unit tests should:
- Only focus on code in the module being tested. Code in other necessary modules should be tested separately in tests for those modules; i.e. every module should have its own set of unit tests, and tests for multiple modules should not be combined.
- Must not involve a solver, and thus cannot test model initialization or results (this is the purpose of
component
tests). - Aim to cover as much of the code as possible (subject to limitations on testing of initialization routines).
- Confirm model is constructed as expected; i.e. test for the presence, type and shape (indexing sets) of expected model components.
- Test all possible
if/else
branches, and ideally all combinations ofif/else
branches. - Test for all
Exceptions
that can be raised by the code.
Unit tests should not include tests of unit consistency, as these are computationally expensive; unit consistency is tested as part of the component
tests.
Component tests are used to test integration of models and tools for a set of well defined test cases for which it is known the model can be solved. By necessity, this often involves integration of multiple models (e.g. Unit Models with property packages). Component tests are also the point where initialization routines and solutions can be tested, and thus can (but do not need to) involve the use of a solver.
Component tests should:
- Aim to cover all lines of code not covered by
unit
tests; i.e. all initialization routines and other code that requires a solver. - Test for consistent units of measurement. Asserting unit consistency is expensive, hence why these are not included in
unit
tests. - Include a test case for all possible model configurations (i.e. for every
if/else
option available during model construction). - Test model results/outputs against expected values to a sufficient tolerance (generally 1-2 orders of magnitude greater than solver tolerance).
- Test values for as many key variables as possible (not just inputs and outputs).
- Test involving solvers should always confirm convergence before checking results.
- Confirm conservation of mass and energy (and momentum if applicable).
Integration tests should:
- Test model performance over as wide a range of conditions and configurations as possible.
- Compare model results/outputs with literature data, including source information (i.e. references).
- Commercial modeling tools should not be used as sources of testing data. Many of these have licensing clauses that prohibit their use for benchmarking and testing of other tools.
- Results/output should be tested to accuracy of literature data. More accurate data is always preferred where possible.
- Should always confirm solver convergence before testing results.
- Should include tests for model robustness and reliability (TBD).
Writing tests is something of an art form, and it takes a while to learn what should be tested and how best to achieve this. Below are some suggestions compiled from the experience of the IDAES developers.
- All tests should include a
pytest.mark
to indicate the type of test. - Tests should be written that execute all branches in conditional statements, and should check to make sure the correct branch was taken.
- Any
Exceptions
raised by the code should be tested. You can usepytest.raises(ExceptionType, matches=str)
to check that the correct type ofException
was raised and that the message matches the expected string. - When testing model solution, always begin by checking that the solver returned an optimal solution.
results = solver.solve(model)
assert results.solver.termination_condition == TerminationCondition.optimal
assert results.solver.status == SolverStatus.ok
- When testing model solutions, check for as many key variables as possible. Include intermediate variables as well to help narrow down any failures.
- Also keep in mind solver tolerances when testing results. The default solver tolerance is
1e-6
, so you should you should try to test to a slightly looser tolerance (we often use1e-5
). Be aware that IPOPTs tolerance is both absolute and relative.
- Tests should always contain an
assert
statement (or equivalent). It is easy to write a test that executes the code, but unless you add checks for specific behaviors all the test will tell you is if there are anyExceptions
during execution.
- Set up pre-commit
- Run pytest with coverage report
- Run Pylint locally
- Update the Pyomo version
- Install Pyomo from a local Git clone
- Set up GitHub authentication with GCM
- Handle warnings in pytest