Skip to content

Commit

Permalink
Merge pull request #262 from sezan92/edit-image-convnext
Browse files Browse the repository at this point in the history
edit convnext mdx for image to be directly visible
  • Loading branch information
JvThunder authored Apr 26, 2024
2 parents 4437a47 + e38707b commit dac3a35
Showing 1 changed file with 6 additions and 3 deletions.
9 changes: 6 additions & 3 deletions chapters/en/unit2/cnns/convnext.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ The key improvements are:
We will go through each of the key improvements.
These designs are not novel in itself. However, you can learn how researchers adapt and modify designs systematically to improve existing models.
To show the effectiveness of each improvement, we will compare the model's accuracy before and after the modification on ImageNet-1K.
[Block Comparison](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/block_comparison.png)

![Block Comparison](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/block_comparison.png)


## Training Techniques
Expand Down Expand Up @@ -64,7 +65,8 @@ One common idea in every Transformer block is the usage of an inverted bottlenec
This idea has also been used and popularized in Computer Vision by MobileNetV2.
ConvNext adopts this idea, having input layers with 96 channels and increasing the hidden layers to 384 channels.
By using this technique, it improves the model accuracy from 80.5% to 80.6%.
[Inverted Bottleneck Comparison](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/inverted_bottleneck.png)

![Inverted Bottleneck Comparison](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/inverted_bottleneck.png)


## Large Kernel Sizes
Expand All @@ -74,7 +76,8 @@ However, before adjusting the kernel size, it is necessary to reposition the dep
This repositioning enables the 1x1 layers to efficiently handle computational tasks, while the depthwise convolution layer functions as a more non-local receptor.
With this, the network can harness the advantages of incorporating bigger kernel-sized convolutions.
Implementing a 7x7 kernel size maintains the accuracy at 80.6% but reduces the overall FLOPs efficiency of the model.
[Moving up the Depth Conv Layer](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/depthwise_moveup.png)

![Moving up the Depth Conv Layer](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/depthwise_moveup.png)


## Micro Design
Expand Down

0 comments on commit dac3a35

Please sign in to comment.