Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider splitting/refactoring base_test.py #100

Open
axelboc opened this issue Aug 29, 2024 · 3 comments
Open

Consider splitting/refactoring base_test.py #100

axelboc opened this issue Aug 29, 2024 · 3 comments

Comments

@axelboc
Copy link
Contributor

axelboc commented Aug 29, 2024

Very low priority, but I find that base_test.py is becoming quite long and is, generally speaking, difficult to read. I feel like we could make things clearer at least by:

  • having multiple "base" test files -- maybe at least one per endpoint (though /data/ could probably use multiple files)
  • splitting the HDF5 files creation logic out of the "base" test files.

Moreover, there are a lot of things to test:

  • API framework (Flask, Fast API, Tornado)
  • Endpoint (/data/, /meta/, etc.)
  • Query param (dtype=safe, format=bin, etc.)
  • HDF5 entity (group, dataset, datatype, soft links, external links
  • HDF5 type (int, float, bool, enum, compound, vlen, etc.)
  • HDF5 shape (empty, scalar, 1D, 2D+) & chunks
  • HDF5 attributes
  • HDF5 filters, virtual datasets, etc.
  • Errors (file/entity not found, etc.)

That's a lot of features and a lot of permutations if we really want to be thorough. It might be worth creating a few big HDF5 files like I did in H5Web (sample.h5) to make sure that we test /data/ and /meta/ with as many permutations of dtype, shape, format, etc., as we can.

@silx-kit silx-kit deleted a comment from rodeok Aug 29, 2024
@loichuder
Copy link
Member

I do agree that there is room for improvement. Notably, I always find a bit misleading that the real tests are in base_test while most test_... files are in fact fixtures.

And I always need to take 5mn to understand how the fixtures are fed in the base tests.

Perhaps levering pytest's parametrize could allow us to make things clearer.

splitting the HDF5 files creation logic out of the "base" test files.

Not sure about this one. I like that I can check what HDF5 file is tested with one glance on the test. Writing tests with big HDF5 files requires external knowledge about these files.

@axelboc
Copy link
Contributor Author

axelboc commented Sep 2, 2024

Notably, I always find a bit misleading that the real tests are in base_test while most test_... files are in fact fixtures.

Haha true.

Not sure about this one. I like that I can check what HDF5 file is tested with one glance on the test. Writing tests with big HDF5 files requires external knowledge about these files.

It's just a lot of repetition: we initialise a dataset path and numpy value, create a file, create the dataset, access it, decode it, and assert that the returned value matches the initial value exactly. If we had one big dictionary of initial values, we could just loop through them to make sure h5grove supports them all and reads them all correctly. This could even be a single test, to be honest - i.e. a smoke test.

@axelboc
Copy link
Contributor Author

axelboc commented Sep 2, 2024

This smoke test would work particularly well for the /data/ endpoint with format=bin.

Generally speaking, I would prefer to have one test per format and for each test to loop through multiple datasets, rather than one test per dataset and for each test to loop through multiple formats (what we have currently).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants