-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fastfood is slow #116
Comments
bottleneck seems to be _phi function
numpy trigonometric functions (cos, sin) are slow (they take the same amount of time as |
Thanks for the investigation @glevv ! There are two issues there,
A PR to fix/improve that situation would be very welcome |
Judging by this implementation it should return matrix NxN and should be used with SVM(kernel='precomputed'). On the other hand in the same repo, they are using [cos(X), sin(X)] matrix as a feature matrix for the next model (and it will be obviously 2*n_components). Straightforward solution would be to divide n_components by 2 (or decrease the power of 2 by 1) for "accuracy" case. |
[Cos, Sin] matrix is a matrix of real and imaginary components of Fourier transform. In these cases sometimes only real part of transformation is taken and imaginary part is dropped, but then there will be no difference between 'accuracy' and 'mem' cases. We can either warn users that in accuracy case feature space will be 2 times higher, or we can drop 'accuracy' case for now and just leave random kitchen sinks method. |
I don't think "[Cos, Sin] matrix is a matrix of real and imaginary components of Fourier transform." is true, if that were the case we could have used fft and we would have obtained a faster implementation but in fact here we don't use fourier transform or am I wrong ? |
Well, I don't know particular implementation, but in paper "Fastfood: Approximate Kernel Expansions in Loglinear Time" [1] they stated |
Yes this is not a Fourier transform. |
Well, I guess the only thing left is to replace numpy math operations with python ones where needed, sice python operations are faster with single numbers.
And maybe change default tradeoff to 'mem', since it is be more consistent and fast. |
Fastfood is somewhat slower than RBFSampler (while in theory it should be faster).
Google Colab timings (sklearn 0.24.2, sklearn_extra 0.2.0, numpy 1.19.5)
Laptop timings (Ubuntu 20.04, Intel 8300H, 32GB RAM) (sklearn 0.23.2, sklearn_extra 0.2.0, numpy 1.19.2)
Sample code
Changing tradeoff_mem_accuracy to 'mem' did not affect speed.
The text was updated successfully, but these errors were encountered: