You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, a few things that are not fully clear to me on Table 1. It says convolution has LH parameters. How can it be if only the A matrix, which is learnable, is of shape LxL. Maybe it is because A is diagonalizable plus low rank, and we only learn the diagonal, and neglect the low rank?
in 3.1, it says:
shouldn't the time complexity should O(N^3L)?
In Table 1, why S4 number of parameters is H^2 and not LH? After all, section 3.4 says the number of parameters is L==N, and we need H dimensions, which makes it LH.
The text was updated successfully, but these errors were encountered:
The convolution column of the table is not an SSM convolution, but directly parameterizing the convolution's kernel elements (like a standard convolution). (This is mentioned in the footnote.) See this work for an example of people attempting this in practice: https://hazyresearch.stanford.edu/blog/2023-02-15-long-convs
It's a matrix-vector multiplication, not matrix-matrix, so $O(N^2)$ per $L$ iterations.
I think you have misread something. S4's parameterization does not depend on sequence length and I don't see anything in Section 3.4 that implies so
Hi, a few things that are not fully clear to me on Table 1. It says convolution has LH parameters. How can it be if only the A matrix, which is learnable, is of shape LxL. Maybe it is because A is diagonalizable plus low rank, and we only learn the diagonal, and neglect the low rank?
shouldn't the time complexity should O(N^3L)?
The text was updated successfully, but these errors were encountered: