Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

64-bit integers not shown correctly in h5web display #1679

Open
gapost opened this issue Jun 20, 2024 · 2 comments
Open

64-bit integers not shown correctly in h5web display #1679

gapost opened this issue Jun 20, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@gapost
Copy link

gapost commented Jun 20, 2024

Describe the bug

64-bit signed and unsigned integers are not always displayed correctly in the h5web display window.

To Reproduce

  1. Go to https://h5web.panosc.eu/h5grove?file=sample.h5
  2. Select the int64_scalar dataset
  3. Select Display view on the right panel
  4. Value shown is: -9223372036854776000

Expected behaviour

The value returned by h5dump -d int64_scalar sample.h5 is: -9223372036854775808

Similar happens with uint64:
uint64_scalar datatype as shown on h5web window: 18446744073709552000
while h5dump reports: 18446744073709551615

Context

  • OS: Linux
  • Browser: Firefox
  • Version: 127.0
  • H5Web context: h5web
@gapost gapost added the bug Something isn't working label Jun 20, 2024
@axelboc
Copy link
Contributor

axelboc commented Jun 20, 2024

Indeed, big integers (int/uint64) numbers are not currently supported. They get converted to primitive JS numbers (float64) to avoid errors, which is why you're seeing inaccurate values.

We would definitely like to improve support for big integers, though, so we'll use this issue to report any progress. Thanks!

@axelboc axelboc added enhancement New feature or request and removed bug Something isn't working labels Jun 20, 2024
@axelboc
Copy link
Contributor

axelboc commented Sep 6, 2024

Scope

The scope of this issue is to make bigints (as in primitive JS bigint values) first-class citizens in @h5web/app so that they can be displayed without any loss of precision in the Scalar, Matrix and Compound visualizations.

The scope does not include maintaining bigint precision in any of the WebGL-based visualizations like Heatmap, Line, RGB, and the NeXus visualizations, or showing exact bigint values in the tooltips of these visualizations.

Challenges

Typing

bigint and number do not play well together in JS: most operators support only number or only bigint operands. For instance, 1 + 1n throws a type error; instead, one has to cast one operand or the other: 1 + Number(1n) (potential loss of precision) or BigInt(1) + 1n.

This means that if, in some parts of the codebase, we end up with the union type number | bigint, it can get ugly very fast. We have to avoid this, which means we also have to avoid (number | bigint)[] and Int8Array | Uint8Array | ... | BigInt64Array | BigUint64Array.

This mostly comes down to the Value<Dataset>, ArrayValue<DType> and Primitive<DType> conditional types. When a dataset's value is sure to be provided as primitive big integers, these types must resolve to a type that extends bigint | bigint[] | BigInt64Array | BigUint64Array (and never number | AnyOtherTypedArray).

Provider support

Not all providers will be able to provide all (u)int-64 values as primitive or typed bigints, at least not in the short term. For instance, h5grove currently sends compound dataset values as JSON, so bigints end up as primitive numbers (i.e. float64) after JSON.parse().

This is a similar problem to booleans, which some providers send as primitive booleans while others send as numbers (0/1). If we can't distinguish the two cases, we have no choice but to type values as number | boolean, which adds typing complexity throughout the codebase.

Working solution

I have a working solution that consists in introducing a new DType interface called BigIntegerType, distinct from NumericType. If a provider can, and plans to, return bigint primitives (or bigint typed arrays), they must create a BigIntergerType object instead of a NumericType object.

It then becomes possible to add conditions in ArrayValue and Primitive to type bigint values exclusively:

  • T extends BigIntegerType ? BigInt64Array | BigUint64Array : ...
  • T extends BigIntegerType ? bigint : ...

Obviously this is just the tip of the iceberg; the full solution required making changes to 46 files.

The problem is that it feels like I'm fighting with our own types a bit. It also gets super confusing in terms of naming throughout the codebase, since for it to work, BigIntegerType must not be considered a NumericType, and the NumArray union (number[] | TypedArray) must not include bigint or bigint typed arrays, which seems silly. Finally, this solution is not easily scalable to similar issues (like with booleans).

Generic solution

The piece of the puzzle I unlocked by working on BigIntegerType is the idea that providers can decide which DType object to create (or what to put into it) based on which primitive values and typed arrays they are able/planning to return.

I'm thinking that we can expand the Dataset and/or DType interface so providers can say explicitly what kind of values they are planning on returning for this dataset. This would effectively decouple the HDF5 type of the dataset from the JavaScript type of the provided value.

For instance, h5grove would be able to announce that it would return a BigInt64Array for a boolean array dataset, but primitive number values for a compound dataset with an int-64 field. The Value<Dataset> type would then resolve only to BigInt64Array in the first case (instead of bigint[] | BigInt64Array | BigUint64Array in my working solution above) and, assuming a second string field, to (number | string)[] for the compound dataset.

I really like this solution because:

  • It has the potential to make our current types stricter — e.g. Int8Array specifically instead of a vague number[] | TypedArray.
  • It's generic and should help improve typing and reduce workarounds for other similar use cases, like booleans or the dtype=safe feature in h5grove (i.e. Float32Array for a float-16 dataset and Float64Array for a float-128 dataset).
  • It lets providers evolve independently from one another, and allows new providers to not be perfectly compatible and optimised out of the box.

Now let's see if I can make it work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants