-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for grouped convolutions #2485
base: master
Are you sure you want to change the base?
Conversation
Yeah that paper is nice. I like these new simple building blocks. I hope they do actually work well beyond simple resnet modules. |
This comment was marked as outdated.
This comment was marked as outdated.
I got carried away with other stuff, I will finish this at some point, though 😅 |
Can this be implemented in the toeplitz matrix ? |
I was planning to get inspiration from here. |
I have a cpu implementation but I have to go through some company bureaucracy to be allowed to upload it. |
Just noticed the toeplitz matrix isn't cached in the |
There are, but there are other allocations and deallocations too. I doubt that one makes a meaningful difference to runtime speed considering what all goes on if someone is using this kind of model. |
@davisking OK, I trust you. I had a quick look at the ncnn repo, and they have like a 100 specialisations of the convolution layer for different parameters like kernel sizes, groups, etc, whether or not the layer is 8-bit quantized, different architectures,... It looks like way too much work to do something similar in dlib to get CPU conv performance up to standard. Sorry this is unrelated to this PR. Just passing observation. |
Yeah that's how you do conv on the CPU fast. The toeplitz matrix thing is a weird hack. I did it (as did others) because it's just a super easy way to support all the various conv types with all their different strides and all that. But fast conv code looks like this kind of stuff https://github.com/davisking/dlib/blob/master/dlib/image_transforms/spatial_filtering.h#L126. Or other similar kinds of setups. It depends on which kind of conv we are talking about. |
9628aa0
to
1382ab5
Compare
In this PR I will try to add support for grouped convolutions in dlib.
I never had any interest/use for this kind of convolutions until yesterday, when I read this paper: A ConvNet for the 2020s.
The paper explores the main additions in Transformer networks and adds them to a convolutional network.
It makes use of recent additions to dlib:
Unfortunately, it makes also use of grouped convolutions, which are not currently supported in dlib.
That was the motivation I needed. So far I've written:
The gpu part is relatively easy, since it's just a matter of using the CuDNN API.
The cpu part might take longer to complete (I don't think I'll ever use it, but I will try to add it for completeness.)
I've already implemented the ConvNeXt models described in the paper, and the forward pass seems to work.
Let me know if the approach is sensible.