Read after write dev #77

grg2rsr · 2024-09-21T09:12:33Z

not everything implemented yet and documentation missing, but putting it up here now to keep thing moving. Works in conjunction with convert_brainwide_map_processed_only_local_testing.py

…ess (documentation missing), but as is passes for dual probe converted with

…mparions

CodyCBakerPhD · 2024-09-23T16:48:06Z

ibl_to_nwb/brainwide_map/convert_brainwide_map_processed_only_local_testing.py

@@ -56,18 +56,20 @@
 # sessions_to_run = list(set(brain_wide_sessions) - set(already_written_processed_sessions))

 nwbfile_path = base_path / "nwbfiles" / session / f"{session}.nwb"
-nwbfile_path.parent.mkdir(exist_ok=True)
+os.makedirs(nwbfile_path.parent, exist_ok=True)


Suggested change

os.makedirs(nwbfile_path.parent, exist_ok=True)

nwbfile_path.parent.mkdir(exist_ok=True)

Why change this? Pathlib is generally more convenient

Can add parents=True if that is necessary

CodyCBakerPhD · 2024-09-23T16:49:47Z

ibl_to_nwb/brainwide_map/convert_brainwide_map_processed_only_local_testing.py

 # Download behavior and spike sorted data for this session
 session_path = base_path / "ibl_conversion" / session
 cache_folder = base_path / "ibl_conversion" / session / "cache"
+os.makedirs(session_path, exist_ok=True)


This should exist at this point from the previous calls

CodyCBakerPhD · 2024-09-23T16:56:23Z

ibl_to_nwb/brainwide_map/read_after_write.py

+from typing import Optional
+from numpy import testing
+
+def check_arrays(array_a: np.ndarray, array_b: np.ndarray, full_check: bool = False):


This style of redirection is not recommended (a helper function just to decide between uses of numpy.testing)

TBH any data array in the processed files will be small enough such that a full (non-subset) usage of numpy.testing.assert_allclose should always be fast enough to just do that

The only question for submitting would be if you want to do something like this on the raw data, but that's probably important enough for it to be a separate check altogether

Also, the NWB write step should not affect floating point precision (and if it did we would actually want to be informed about that) so I recommend using numpy.testing.assert_array_equal instead. Or in general, recommend numpy.testing.assert_array_almost_equal since it should check other aspects of the arrays beyond the data values

The only question for submitting would be if you want to do something like this on the raw data, but that's probably important enough for it to be a separate check altogether

I had this in mind when I was writing the function, and that small speed improvements are probably going to result in large total time savings. That said, I haven't done any benchmarking yet (besides the np.searchsorted way to access the spike times).

Also, the NWB write step should not affect floating point precision (and if it did we would actually want to be informed about that) so I recommend using numpy.testing.assert_array_equal instead. Or in general, recommend numpy.testing.assert_array_almost_equal since it should check other aspects of the arrays beyond the data values

I wasn't entirely sure if this is true across all architectures or if we need to consider some other form of rounding error imprecision, so I thought using allclose would allow to adjust for that if necessary, and if true equality is desired rtol=0 should do that.

One way to think of the NWB conversion process (for bulk data anyway) is that 'bytes in' are still just 'bytes out', so if you decoded the original SpikeGLX binaries into their raw bytes (things that look like \xff\x01r), those sequences should be exactly the same as when you read them from the data array after it's written to the NWB file

Floating point errors would mostly come up if you performed operations like scaling (such as by the conversion factor to get values into scientific units)

Or if you used lossless compression to save more space in the file; though that would often exceed standard 'floating point' error

CodyCBakerPhD · 2024-09-23T16:57:41Z

ibl_to_nwb/brainwide_map/read_after_write.py

+
+
+
+def check_series(series_a: pd.Series, series_b: pd.Series, full_check: bool = False):


Again, not really recommended to define a one-line redirection function. Just making more work than necessary for ourselves here

This had somewhat historical reasons and I left it in as a stub. When I started coding, I wasn't even aware of the existence of numpy.testing so the initial flow of the data was in chained calls where I could go starting from a complex datatype like a pd.DataFrame consecutively to simple ones, where in the end an assert() call would take care of the comparison of the selected numbers.

Things have changed, but I left the general structure up for discussion in case it could still be useful for a case I haven't thought about.

CodyCBakerPhD · 2024-09-23T16:59:46Z

ibl_to_nwb/brainwide_map/read_after_write.py

+    check_arrays(series_a.values, series_b.values, full_check=full_check)
+
+
+def check_tables(table_a: pd.DataFrame, table_b: pd.DataFrame, naming_map: dict = None, full_check: bool = False):


If this is for checking that two pandas data frames are equivalent, why not usepandas.testing.assert_frame_equal? (and resolve any column name mapping by creating a copy of the NWB or ONE frame and renaming the columns on the copy passed to the assertion?)

:) thanks for pointing me towards pandas.testing

CodyCBakerPhD · 2024-09-23T17:00:51Z

ibl_to_nwb/brainwide_map/read_after_write.py

+        table_a (pd.DataFrame): _description_
+        table_b (pd.DataFrame): _description_
+        naming_map (dict, optional): if naming map is given, it is used to map the names of columns of table_a to those of table_b. Defaults to None, checks if columns are identical.
+        full_check (bool, optional): _description_. Defaults to False.


An index subset approach should especially not be needed for tables since any data here is already loaded into RAM (making any non-subset assertion super fast)

CodyCBakerPhD · 2024-09-23T17:07:32Z

ibl_to_nwb/brainwide_map/read_after_write_testing.py

+from pynwb import NWBHDF5IO, NWBFile
+from one.api import ONE
+import logging
+import read_after_write as raw


Highly recommend not aliasing to real words that also have other contextual meaning.

Also don't recommend importing as an improper local module

Suggested change

import read_after_write as raw

from ibl_to_nwb.brainwide_map import read_after_write

CodyCBakerPhD · 2024-09-23T17:07:50Z

ibl_to_nwb/brainwide_map/read_after_write_testing.py

+io = NWBHDF5IO(file=h5py_file, load_namespaces=True)
+nwbfile = io.read()
+
+raw.test_IblSortingInterface(nwbfile, one, eid)


Suggested change

raw.test_IblSortingInterface(nwbfile, one, eid)

read_after_write.test_IblSortingInterface(nwbfile, one, eid)

CodyCBakerPhD · 2024-09-23T18:17:05Z

ibl_to_nwb/brainwide_map/read_after_write_testing.py

+nwbfile = io.read()
+
+raw.test_IblSortingInterface(nwbfile, one, eid)
+raw.test_WheelInterface(nwbfile, one, eid)
+raw.test_RoiMotionEnergyInterface(nwbfile, one, eid)
+raw.test_BrainwideMapTrialsInterface(nwbfile, one, eid)
+raw.test_IblPoseEstimationInterface(nwbfile, one, eid)
+raw.test_LickInterface(nwbfile, one, eid)
+raw.test_PupilTrackingInterface(nwbfile, one, eid)


The step of reading the NWB file and running the tests on it could be its own function, and we could even back it into the run_conversion procedure so that it's checked every time and errors out if there's ever an issue

I imagined that the final format would be something like what you suggested in the #88 , I had in here just to verify quickly and easily that it all runs, to be integrated into a more proper structure later on.

grg2rsr · 2024-09-24T10:01:21Z

Thank you for the code review. Really helpful for me!

CodyCBakerPhD · 2024-09-26T14:25:47Z

replaced by #88

grg2rsr added 5 commits September 19, 2024 11:36

some path fixes

d3b0b8f

read_after_write added. Currently, only processed data, work in progr…

da1545b

…ess (documentation missing), but as is passes for dual probe converted with

blackened

8d13432

using numpy.testing instead of hand-written code for the numerical co…

f245a06

…mparions

tiniest fix

d65ecca

grg2rsr mentioned this pull request Sep 23, 2024

IBL TODO: Add roundtrip consistency tests #61

Open

CodyCBakerPhD linked an issue Sep 23, 2024 that may be closed by this pull request

IBL TODO: Add roundtrip consistency tests #61

Open

CodyCBakerPhD reviewed Sep 23, 2024

View reviewed changes

CodyCBakerPhD mentioned this pull request Sep 24, 2024

Add API consistency checks #88

Merged

CodyCBakerPhD marked this pull request as draft September 24, 2024 05:15

CodyCBakerPhD closed this Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read after write dev #77

Read after write dev #77

grg2rsr commented Sep 21, 2024

CodyCBakerPhD Sep 23, 2024

CodyCBakerPhD Sep 23, 2024

CodyCBakerPhD Sep 23, 2024

CodyCBakerPhD Sep 23, 2024

grg2rsr Sep 24, 2024

CodyCBakerPhD Sep 24, 2024

CodyCBakerPhD Sep 23, 2024

grg2rsr Sep 24, 2024

CodyCBakerPhD Sep 23, 2024

grg2rsr Sep 24, 2024

CodyCBakerPhD Sep 23, 2024

CodyCBakerPhD Sep 23, 2024

grg2rsr Sep 24, 2024

CodyCBakerPhD Sep 23, 2024

CodyCBakerPhD Sep 23, 2024

grg2rsr Sep 24, 2024

grg2rsr commented Sep 24, 2024

CodyCBakerPhD commented Sep 26, 2024

	os.makedirs(nwbfile_path.parent, exist_ok=True)
	nwbfile_path.parent.mkdir(exist_ok=True)




		def check_series(series_a: pd.Series, series_b: pd.Series, full_check: bool = False):

		check_arrays(series_a.values, series_b.values, full_check=full_check)


		def check_tables(table_a: pd.DataFrame, table_b: pd.DataFrame, naming_map: dict = None, full_check: bool = False):

	import read_after_write as raw

	from ibl_to_nwb.brainwide_map import read_after_write

	raw.test_IblSortingInterface(nwbfile, one, eid)
	read_after_write.test_IblSortingInterface(nwbfile, one, eid)

Read after write dev #77

Read after write dev #77

Conversation

grg2rsr commented Sep 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grg2rsr commented Sep 24, 2024

CodyCBakerPhD commented Sep 26, 2024