Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] TileDB-SOMA-Py should map None, pd.NA, and math.nan in string columns to null values #2861

Open
johnkerl opened this issue Aug 8, 2024 · 0 comments
Labels
bug Something isn't working python-api

Comments

@johnkerl
Copy link
Member

johnkerl commented Aug 8, 2024

Split out from #2858.

Here is a repro script:
https://gist.github.com/johnkerl/de55bd74b146b19a7915c5aee9914752

Here is a readback script:
https://gist.github.com/johnkerl/20e0ad08701f5913f90be706ecd99b01

Notes:

  • The input values are ["", "B cell", "T cell", None, pd.NA, math.nan]
  • We write these all as strings, including "None", "<NA>", and "nan"
  • I believe (but will seek feedback regarding) we should map None, pd.NA, and math.nan in string data to core nulls in TileDB storage
  • For clarity: this issue refers stricly to string columns, not enumeration-of-string columns

Note: this is for Pandas data. For Arrow nulls -- tiledbsoma.Experiment.add_new_dataframe rather than tiledbsoma.io.from_anndata/tiledbsoma.io.from_h5ad -- please see #2858 where I'll create a separate issue to track.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python-api
Projects
None yet
Development

No branches or pull requests

1 participant