Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for grouped convolutions #2485

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

arrufat
Copy link
Contributor

@arrufat arrufat commented Jan 13, 2022

In this PR I will try to add support for grouped convolutions in dlib.
I never had any interest/use for this kind of convolutions until yesterday, when I read this paper: A ConvNet for the 2020s.
The paper explores the main additions in Transformer networks and adds them to a convolutional network.
It makes use of recent additions to dlib:

Unfortunately, it makes also use of grouped convolutions, which are not currently supported in dlib.
That was the motivation I needed. So far I've written:

  • forward gpu
  • backward gpu
  • forward cpu
  • backward cpu

The gpu part is relatively easy, since it's just a matter of using the CuDNN API.
The cpu part might take longer to complete (I don't think I'll ever use it, but I will try to add it for completeness.)

I've already implemented the ConvNeXt models described in the paper, and the forward pass seems to work.
Let me know if the approach is sensible.

@pfeatherstone
Copy link
Contributor

Yeah that paper is nice. I like these new simple building blocks. I hope they do actually work well beyond simple resnet modules.

@dlib-issue-bot

This comment was marked as outdated.

@arrufat
Copy link
Contributor Author

arrufat commented Feb 21, 2022

I got carried away with other stuff, I will finish this at some point, though 😅

@pfeatherstone
Copy link
Contributor

Can this be implemented in the toeplitz matrix ?

@arrufat
Copy link
Contributor Author

arrufat commented Mar 16, 2022

Can this be implemented in the toeplitz matrix ?

I was planning to get inspiration from here.
But feel free to step in, I don't think I will find time to do this soon, sadly enough :(

@rTreutlein
Copy link

I have a cpu implementation but I have to go through some company bureaucracy to be allowed to upload it.

@pfeatherstone
Copy link
Contributor

Just noticed the toeplitz matrix isn't cached in the tensor_conv class. So there are a lot of allocations and deallocations. Right?

@davisking
Copy link
Owner

Just noticed the toeplitz matrix isn't cached in the tensor_conv class. So there are a lot of allocations and deallocations. Right?

There are, but there are other allocations and deallocations too. I doubt that one makes a meaningful difference to runtime speed considering what all goes on if someone is using this kind of model.

@pfeatherstone
Copy link
Contributor

@davisking OK, I trust you. I had a quick look at the ncnn repo, and they have like a 100 specialisations of the convolution layer for different parameters like kernel sizes, groups, etc, whether or not the layer is 8-bit quantized, different architectures,... It looks like way too much work to do something similar in dlib to get CPU conv performance up to standard. Sorry this is unrelated to this PR. Just passing observation.

@davisking
Copy link
Owner

Yeah that's how you do conv on the CPU fast. The toeplitz matrix thing is a weird hack. I did it (as did others) because it's just a super easy way to support all the various conv types with all their different strides and all that. But fast conv code looks like this kind of stuff https://github.com/davisking/dlib/blob/master/dlib/image_transforms/spatial_filtering.h#L126. Or other similar kinds of setups. It depends on which kind of conv we are talking about.

@arrufat arrufat force-pushed the grouped-convolutions branch 2 times, most recently from 9628aa0 to 1382ab5 Compare May 1, 2022 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants