Facing errors with modified SSM equations in the bwd CUDA kernel (wrt $\Delta$ `bias` & `softplus`) #604

SudhanshuBokade · 2024-10-23T15:00:11Z

I was trying to learn how to make changes to the fwd & bwd CUDA kernels by attempting a simple modification:

$$ \begin{align*} x &= (e^{\Delta A} + \Delta B)x + \Delta Bu \\ y &= Cx + Du \end{align*} $$

I manually calculated the backward pass derivatives:

$$ \begin{align*} dx &= dy \cdot C \\ d\Delta &= dx \cdot (x \cdot e^{\Delta A} \cdot A + Bx + Bu) \\ du &= dx \cdot B \cdot \Delta + dy \cdot u \\ dA &= dx \cdot e^{\Delta A} \cdot x \cdot \Delta \\ dB &= dx \cdot \Delta \cdot (x + u) \\ dC &= dy \cdot x \\ dD &= dy \cdot u \end{align*} $$

All the $x$'s in the bwd pass above are $x_{t-1}$. Accordingly, I modified thread_data, thread_reverse_data, smem_delta_a, and the derivative calculations in the backward and forward kernels (.cuh files), and updated selective_scan_ref() in selective_scan_interface.py. After building the modified code and testing (test_selective_scan.py::test_selective_scan()) via pytest without changing tolerances, I got these results:

Configuration	Tests Passing ✅	Tests Failing ❌
Disabling both `delta_softplus` and `delta_bias`	All	None
Disabling only `delta_softplus`	`seq_len < 8_192`	`seq_len >= 8_192`
Disabling only `delta_bias`	`seq_len < 8_192`	`seq_len >= 8_192`
Enabling both `delta_softplus` and `delta_bias`	`seq_len < 4_096`	`seq_len >= 4_096`

Issues:

I haven't changed $\Delta$, and delta_softplus should automatically handle scaling (selective_scan_bwd_kernel.cuh#L452-L456). Where could the error be?
There seems to be error accumulation as seq_len increases in this case, but not for Mamba. Am I missing some other change?
Changing gridDim & blockDim for the kernel launch alters the error magnitudes (errors increase). Why?

Thank you for your help.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Facing errors with modified SSM equations in the bwd CUDA kernel (wrt $\Delta$ `bias` & `softplus`) #604

Facing errors with modified SSM equations in the bwd CUDA kernel (wrt $\Delta$ `bias` & `softplus`) #604

SudhanshuBokade commented Oct 23, 2024 •

edited

Loading

Facing errors with modified SSM equations in the bwd CUDA kernel (wrt $\Delta$ bias & softplus) #604

Facing errors with modified SSM equations in the bwd CUDA kernel (wrt $\Delta$ bias & softplus) #604

Comments

SudhanshuBokade commented Oct 23, 2024 • edited Loading

Facing errors with modified SSM equations in the bwd CUDA kernel (wrt $\Delta$ `bias` & `softplus`) #604

Facing errors with modified SSM equations in the bwd CUDA kernel (wrt $\Delta$ `bias` & `softplus`) #604

SudhanshuBokade commented Oct 23, 2024 •

edited

Loading