Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default units in raw.to_data_frame() #12701

Open
chapochn opened this issue Jul 7, 2024 · 11 comments
Open

Default units in raw.to_data_frame() #12701

chapochn opened this issue Jul 7, 2024 · 11 comments

Comments

@chapochn
Copy link
Contributor

chapochn commented Jul 7, 2024

Description of the problem

It is not a bug per se, but I feel it is a quite misleading default. Getting the data through "get_data" and "to_data_frame()" give different output values by default, because to_data_frame converts to a "standard" unit. I am not sure if it is intended? My suggestion would be to not have such behavior by default. Although maybe you want to_data_frame() to be coherent with plotting rather than "get_data". I would think it makes more sense to be coherent with "get_data". Thoughts?

Steps to reproduce

import numpy as np
import mne

n=1
X = np.ones((1, n))
chs = ['ch1']
info = mne.create_info(chs, 5, ch_types='eeg')
raw = mne.io.RawArray(X, info)

df = raw.to_data_frame(index='time')
data = raw.get_data()
print(df.values, data)

Link to data

No response

Expected results

I would have expected to get:
[[1.]] [[1]]

Actual results

[[1000000.]] [[1.]]

I can get the same result for both by using:
df = raw.to_data_frame(index='time', scalings=1)

Additional information

Core
├☑ mne 1.7.1 (latest release)
├☑ numpy 1.26.4 (OpenBLAS 0.3.27 with 12 threads)
├☑ scipy 1.14.0
└☑ matplotlib 3.8.4Backend MacOSX is interactive backend. Turning interactive mode on.
(backend=MacOSX)
Numerical (optional)
├☑ sklearn 1.5.1
├☑ numba 0.60.0
├☑ nibabel 5.2.1
├☑ nilearn 0.10.4
├☑ dipy 1.9.0
├☑ openmeeg 2.5.11
├☑ pandas 2.2.2
├☑ h5io 0.2.3
├☑ h5py 3.11.0
└☐ unavailable cupy
Visualization (optional)
├☑ pyvista 0.43.10 (OpenGL 4.1 Metal - 88.1 via Apple M2 Pro)
├☑ pyvistaqt 0.11.1
objc[24995]: Class QT_ROOT_LEVEL_POOL__THESE_OBJECTS_WILL_BE_RELEASED_WHEN_QAPP_GOES_OUT_OF_SCOPE is implemented in both /Users/.../Programs/miniconda3/envs/mne/lib/libQt5Core.5.15.8.dylib (0x17f9e12e0) and /Users/.../Programs/miniconda3/envs/mne/lib/libQt6Core.6.7.2.dylib (0x3302bd558). One of the two will be used. Which one is undefined.
objc[24995]: Class KeyValueObserver is implemented in both /Users/.../Programs/miniconda3/envs/mne/lib/libQt5Core.5.15.8.dylib (0x17f9e1308) and /Users/.../Programs/miniconda3/envs/mne/lib/libQt6Core.6.7.2.dylib (0x3302bd580). One of the two will be used. Which one is undefined.
objc[24995]: Class RunLoopModeTracker is implemented in both /Users/.../Programs/miniconda3/envs/mne/lib/libQt5Core.5.15.8.dylib (0x17f9e1358) and /Users/.../Programs/miniconda3/envs/mne/lib/libQt6Core.6.7.2.dylib (0x3302bd5d0). One of the two will be used. Which one is undefined.
├☑ vtk 9.3.0
├☑ qtpy 2.4.1
Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)

(it crashes here)

@chapochn chapochn added the BUG label Jul 7, 2024
@agramfort
Copy link
Member

agramfort commented Jul 8, 2024 via email

@hoechenberger
Copy link
Member

@chapochn Thanks for this suggestion!

I have to agree with @agramfort on this one, the purpose of the DataFrame export is different from numpy array extraction IMHO

@sappelhoff @cbrnr WDYT

@cbrnr
Copy link
Contributor

cbrnr commented Jul 9, 2024

I also think that this is not a bug but intended behavior. It is also documented (https://mne.tools/stable/generated/mne.io.Raw.html#mne.io.Raw.to_data_frame), but maybe we could highlight this behavior more clearly at the top of the to_data_frame docstring or in a Notes section (instead of just in the description of the scalings parameter)?

@sappelhoff
Copy link
Member

I agree with Alex, Richard, and Clemens. This behavior is as intended and IMHO also helpful/useful. Highlighting this behavior more clearly could be a valuable addition to the documentation.

@chapochn
Copy link
Contributor Author

chapochn commented Jul 9, 2024

An added documentation to clarify the intention of this function sounds great. Otherwise, it wasn't really clear to me that it is not just a more convenient get_data function (I mainly work with panda DataFrames for my own analysis)

@cbrnr
Copy link
Contributor

cbrnr commented Jul 9, 2024

@chapochn would you be interested in submitting a PR?

@chapochn
Copy link
Contributor Author

chapochn commented Jul 9, 2024

I could write the intention, but TBH, I don't understand the intention... For me, it's just confusing to get another data than what is actually in the data... If it is needs to be plotted, plots can handle scaling the data, so I really don't see the necessity.

@chapochn
Copy link
Contributor Author

chapochn commented Jul 9, 2024

The other confusing thing is that in scaling it says:

Scaling factor applied to the channels picked. If None, defaults to dict(eeg=1e6, mag=1e15, grad=1e13) — i.e., converts EEG to µV, magnetometers to fT, and gradiometers to fT/cm.

But it doesn't cover all the other types of data that are possible. I am using ecog, it's not clear what kind of scaling it does. What about ekg channels, misc, etc.

@chapochn
Copy link
Contributor Author

chapochn commented Jul 9, 2024

I see these are all the units that are covered in Raw:

Proper units of measure:
V: eeg, eog, seeg, dbs, emg, ecg, bio, ecog
T: mag
T/m: grad
M: hbo, hbr
Am: dipole
AU: misc

So would be good to explicitly say what is the conversion for each unit. And if it is only based on the original unit, or also on the modality. I.e., are eeg, eog, seeg, dbs, emg, ecg, bio, ecog all the same conversion, or if it depends.

I believe these signals, although all in V, can have different orders of magnitude

@larsoner
Copy link
Member

larsoner commented Jul 9, 2024

It's likely that the scaling values and related channel types pull from mne.defaults somewhere that we could document in our glossary if we don't already. They are probably used elsewhere anyway.

@chapochn
Copy link
Contributor Author

chapochn commented Jul 9, 2024

Yes, it's here:

The set of channels considered “data channels” in MNE contains the following types (together with scale factors for plotting)

https://mne.tools/stable/documentation/glossary.html#term-data-channels
and here:

The set of channels considered “non-data channels” in MNE contains the following types (together with scale factors for plotting):

https://mne.tools/stable/documentation/glossary.html#term-non-data-channels

Should probably then add "and for DataFrame export"

And it could be cross-linked to the places where these scalings are used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants