how to correctly preprocess data in spikeinterface before running kilosort #3483

chongtianyifa · 2024-10-17T06:21:28Z

Hi,

I am confused about the preprocessing chain before running kilosort. In nearly all provided examples, 'highpass_filter', 'phase_shift' and 'common_reference' were run and sorters were called like:
sorting = si.run_sorter('kilosort2_5', rec, output_folder=base_folder / 'kilosort2.5_output', docker_image=True, verbose=True, **params_kilosort2_5)
Here 'car' and 'skip_kilosort_preprocessing' should be True and False by default, which means the filtering and common_reference were calculated again in kilosort. What is the meaning of the preprocessing in spikeinterface?

I am trying to do all preprocessing and drift correction using 'dreg' in spikeinterface before calling kilosort by:
sorting = si.run_sorter(sorter_name='kilosort2_5', recording=raw_rec, folder=r'F:\B12M2S10_g0\B12M2S10_g0_imec0\kilo25-3\sort', skip_kilosort_preprocessing=True, scaleproc=200, do_correction=False, docker_image=False, verbose=True)
The running failed, most likely due to the data not being whitened by setting skip_kilosort_preprocessing as True. Any suggestion about my pipeline?

The text was updated successfully, but these errors were encountered:

zm711 · 2024-10-17T12:30:36Z

Sorry I'm trying to understand this part of your statement:

Here 'car' and 'skip_kilosort_preprocessing' should be True and False by default, which means the filtering and common_reference were calculated again in kilosort. What is the meaning of the preprocessing in spikeinterface?

The default is to do the Kilosort preprocessing. This is true for all sorters. We want the default to just be what the sorter wants do to. Ie you just put in a raw_recording in order to let the sorter handle everything. We also give you the option to do your own preprocessing so you can fine-tune in which case you do the preprocessing yourself and then set the values to car=False, skip_kilosort_preprocessing=True.

As far as your second issue. You raise a great point that whitening is super important for many spike sorters these days. We have a whitening function you can read the params here and you could add that to kilosort workflow. Since I don't know the dredge workflow super well I'll tag @cwindolf to see if he has advice on plugging in his code into a kilosort workflow or whether we should tag someone else on the team.

cwindolf · 2024-10-17T13:19:47Z

Thanks @zm711 -- whitening is definitely important for KS, so adding a si.whiten() to your preprocessing chain would be important if you have skip_kilosort_preprocessing=True. I actually haven't used it before, so I'm not sure whether you need to do that with int_scale=200 -- maybe someone else can comment? Can't find an example of a workflow like this in the docs.

JoeZiminski · 2024-10-17T14:31:47Z

Hey @chongtianyifa could you link some of the examples that were followed? I agree with @zm711 this is the expected behaviour. It might be useful to update these examples so kilosort is called with car=False and freq_min=<some value lower than the low bound on the spikeinterface bandpass> (unfortunately you can't turn it off entirely in KS) so it is clear what is going on. Doing these two preprocessing steps again should have little effect (both filters are zero phase) but might as well skip it if you can.

I think @cwindolf is correct you should run si.whiten with int_scale=200 to replicate the kilosort behaviour. As this workflow is relatively new (all preprocessing in spikeinterface, sorting only with KS) I would suggest comparing the outputs of kilosort using the two workflows. Unfortunately it is not simple to get access to the KS preprocessed data (#2650 , though this may have changed in KS4) but you can get a rough idea by looking at the template shapes (in general, their order will unlikely correspond 1:1 across runs) and plotting spikes directly from the kilosort output with kilosort drift map. I'm currently porting this to python / spikeinterface, should be able to finalise the PR by today / tomorrow so will keep you updated if useful.

That being said, it depends what you want to do. If you want to leverage the spikeinterface postprocessing toolkit, you will want a recording object to pass to the sorting analyzer that matches as closely as possible the preprocessed recording. As an extreme example, it would be a mistake to load raw data into spikeinterface, run the sorting with all kilosort preprocessing steps, then load the sorting output and the raw (unpreprocessed) spikeinterface recording into a sorting analyzer.

However, if you don't want to use spikeinterface postprocessing, you can just pass the raw data to kilosort (using it's own preprocessing) and work with the kilosort output directly.

chongtianyifa · 2024-10-17T15:28:54Z

@JoeZiminski These two examples preprocessed recording and called kilosort allowing it to preprocess again:
https://github.com/SpikeInterface/spikeinterface/blob/main/examples/get_started/quickstart.py
https://github.com/SpikeInterface/spikeinterface/blob/main/examples/how_to/analyze_neuropixels.py.
According to your explanation, can I say that repeating preprocessing should be ok as CAR and highpass filtering should not affect preprocessed data too much?

By summary, there are three options about preprocessing. Please correct me @cwindolf @JoeZiminski @zm711 if my understanding is not right.

Do not preprocess, just feed the raw recording into kilosort. Then you cannot use postprocessing toolkits.
Do all preprocessing in spikeinterface, including 'highpass_filter', 'phase_shift', 'common_reference' and 'whiten'. Feed this rec_preprocess into kilosort by calling si.run_sorter(sorter_name='kilosort2_5', recording=rec_preprocess, folder=r'F:\B12M2S10_g0\B12M2S10_g0_imec0\kilo25-3\sort', car=False, skip_kilosort_preprocessing=True, scaleproc=200). Then you can perfectly combine rec_preprocess and kilosort output using sorting analyzer.
Partially preprocess data in spikeinterface, including 'highpass_filter', 'phase_shift', 'common_reference' without 'whiten'. Feed this rec_preprocess_part into kilosort by calling si.run_sorter(sorter_name='kilosort2_5', recording=rec_preprocess_part , folder=r'F:\B12M2S10_g0\B12M2S10_g0_imec0\kilo25-3\sort', car=False, freq_min=<some value lower than the low bound on the spikeinterface bandpass>, scaleproc=200). This is in fact what most examples do. My question is how to combine rec_preprocess_part and kilosort output using sorting analyzer as data is further whitened by kilosort.

JoeZiminski · 2024-10-22T16:21:58Z

Hi @chongtianyifa that looks correct to me. For 2. you may also want to run a drift-correction step prior to whitening.

For 3. this step will also be run by kilosort before whitening. The whitening step changes the waveforms significantly so it is usual to un-whiten the data before waveform extraction, so that's not a problem. However, the drift correction could be, depending on how much drift is in your data.

In this case, 3. spikeinterface will be performing postprocessing using the information from the sorting output, and on a recording that has not been drift corrected (while the sorting estimation is on corrected data). I'm not 100% sure of the practical implications of this, you may have noise propagated through the data. My understanding is, as the waveforms (across all channels) will be extracted based on the spike times from the sorting. Then these will be averaged for template generation and maximum loading channels estimated. If there is uncorrected drift, then the templates and position estimates will be noisier. @zm711 @alejoe91 @samuelgarcia will be able to confirm if this is correct.

zm711 added the question General question regarding SI label Oct 17, 2024

JoeZiminski self-assigned this Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to correctly preprocess data in spikeinterface before running kilosort #3483

how to correctly preprocess data in spikeinterface before running kilosort #3483

chongtianyifa commented Oct 17, 2024

zm711 commented Oct 17, 2024

cwindolf commented Oct 17, 2024

JoeZiminski commented Oct 17, 2024

chongtianyifa commented Oct 17, 2024 •

edited

Loading

JoeZiminski commented Oct 22, 2024

how to correctly preprocess data in spikeinterface before running kilosort #3483

how to correctly preprocess data in spikeinterface before running kilosort #3483

Comments

chongtianyifa commented Oct 17, 2024

zm711 commented Oct 17, 2024

cwindolf commented Oct 17, 2024

JoeZiminski commented Oct 17, 2024

chongtianyifa commented Oct 17, 2024 • edited Loading

JoeZiminski commented Oct 22, 2024

chongtianyifa commented Oct 17, 2024 •

edited

Loading