Consider splitting/refactoring `base_test.py` #100

axelboc · 2024-08-29T15:02:56Z

Very low priority, but I find that base_test.py is becoming quite long and is, generally speaking, difficult to read. I feel like we could make things clearer at least by:

having multiple "base" test files -- maybe at least one per endpoint (though /data/ could probably use multiple files)
splitting the HDF5 files creation logic out of the "base" test files.

Moreover, there are a lot of things to test:

API framework (Flask, Fast API, Tornado)
Endpoint (/data/, /meta/, etc.)
Query param (dtype=safe, format=bin, etc.)
HDF5 entity (group, dataset, datatype, soft links, external links
HDF5 type (int, float, bool, enum, compound, vlen, etc.)
HDF5 shape (empty, scalar, 1D, 2D+) & chunks
HDF5 attributes
HDF5 filters, virtual datasets, etc.
Errors (file/entity not found, etc.)

That's a lot of features and a lot of permutations if we really want to be thorough. It might be worth creating a few big HDF5 files like I did in H5Web (sample.h5) to make sure that we test /data/ and /meta/ with as many permutations of dtype, shape, format, etc., as we can.

The text was updated successfully, but these errors were encountered:

loichuder · 2024-08-30T06:32:20Z

I do agree that there is room for improvement. Notably, I always find a bit misleading that the real tests are in base_test while most test_... files are in fact fixtures.

And I always need to take 5mn to understand how the fixtures are fed in the base tests.

Perhaps levering pytest's parametrize could allow us to make things clearer.

splitting the HDF5 files creation logic out of the "base" test files.

Not sure about this one. I like that I can check what HDF5 file is tested with one glance on the test. Writing tests with big HDF5 files requires external knowledge about these files.

axelboc · 2024-09-02T07:01:17Z

Notably, I always find a bit misleading that the real tests are in base_test while most test_... files are in fact fixtures.

Haha true.

Not sure about this one. I like that I can check what HDF5 file is tested with one glance on the test. Writing tests with big HDF5 files requires external knowledge about these files.

It's just a lot of repetition: we initialise a dataset path and numpy value, create a file, create the dataset, access it, decode it, and assert that the returned value matches the initial value exactly. If we had one big dictionary of initial values, we could just loop through them to make sure h5grove supports them all and reads them all correctly. This could even be a single test, to be honest - i.e. a smoke test.

axelboc · 2024-09-02T07:56:08Z

This smoke test would work particularly well for the /data/ endpoint with format=bin.

Generally speaking, I would prefer to have one test per format and for each test to loop through multiple datasets, rather than one test per dataset and for each test to loop through multiple formats (what we have currently).

silx-kit deleted a comment from rodeok Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider splitting/refactoring `base_test.py` #100

Consider splitting/refactoring `base_test.py` #100

axelboc commented Aug 29, 2024 •

edited

Loading

loichuder commented Aug 30, 2024

axelboc commented Sep 2, 2024 •

edited

Loading

axelboc commented Sep 2, 2024

Consider splitting/refactoring base_test.py #100

Consider splitting/refactoring base_test.py #100

Comments

axelboc commented Aug 29, 2024 • edited Loading

loichuder commented Aug 30, 2024

axelboc commented Sep 2, 2024 • edited Loading

axelboc commented Sep 2, 2024

Consider splitting/refactoring `base_test.py` #100

Consider splitting/refactoring `base_test.py` #100

axelboc commented Aug 29, 2024 •

edited

Loading

axelboc commented Sep 2, 2024 •

edited

Loading