Skip to content

Commit

Permalink
Updated suggestions and broken link
Browse files Browse the repository at this point in the history
  • Loading branch information
sergiopaniego committed Apr 26, 2024
1 parent e84231b commit 5a071d4
Show file tree
Hide file tree
Showing 10 changed files with 53 additions and 53 deletions.
10 changes: 5 additions & 5 deletions chapters/en/unit1/image_and_imaging/examples-preprocess.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@ Now that we have seen what are images, how they are acquired, and their impact,
## Operations in Digital Image Processing
In digital image processing, operations on images are diverse and can be categorized into:

- Logical.
- Statistical.
- Geometrical.
- Mathematical.
- Transform operations.
- Logical
- Statistical
- Geometrical
- Mathematical
- Transform operations

Each category encompasses different techniques, such as morphological operations under logical operations or fourier transforms and principal component analysis (PCA) under transforms. In this context, we refer to morphology as the group of operations that use structuring elements to generate images of the same size by looking into the values of the pixel neighborhood. Understanding the distinction between element-wise and matrix operations is important in image manipulation. Element-wise operations, such as raising an image to a power or dividing it by another image, involve processing each pixel individually. This pixel-based approach contrasts with matrix operations, which utilize matrix theory for image manipulation. Having said that, you can do whatever you want with images, as they are matrices containing numbers!

Expand Down
18 changes: 9 additions & 9 deletions chapters/en/unit10/blenderProc.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -122,9 +122,9 @@ Here are some images rendered with the basic example:

## Blender Resources

- [User Manual](https://docs.blender.org/manual/en/latest/0).
- [Awesome-blender -- Extensive list of resources](https://awesome-blender.netlify.app).
- [Blender Youtube Channel](https://www.youtube.com/@BlenderOfficial).
- [User Manual](https://docs.blender.org/manual/en/latest/0)
- [Awesome-blender -- Extensive list of resources](https://awesome-blender.netlify.app)
- [Blender Youtube Channel](https://www.youtube.com/@BlenderOfficial)

### The following video explains how to render a 3D syntehtic dataset in Blender:

Expand All @@ -136,15 +136,15 @@ Here are some images rendered with the basic example:

## Papers / Blogs

- [Developing digital twins of multi-camera metrology systems in Blender](https://iopscience.iop.org/article/10.1088/1361-6501/acc59e/pdf_).
- [Generate Depth and Normal Maps with Blender](https://www.saifkhichi.com/blog/blender-depth-map-surface-normals).
- [Object detection with synthetic training data](https://medium.com/rowden/object-detection-with-synthetic-training-data-f6735a5a34bc).
- [Developing digital twins of multi-camera metrology systems in Blender](https://iopscience.iop.org/article/10.1088/1361-6501/acc59e/pdf_)
- [Generate Depth and Normal Maps with Blender](https://www.saifkhichi.com/blog/blender-depth-map-surface-normals)
- [Object detection with synthetic training data](https://medium.com/rowden/object-detection-with-synthetic-training-data-f6735a5a34bc)

## BlenderProc Resources

- [BlenderProc Github Repo](https://github.com/DLR-RM/BlenderProc).
- [BlenderProc: Reducing the Reality Gap with Photorealistic Rendering](https://elib.dlr.de/139317/1/denninger.pdf).
- [Documentation](https://dlr-rm.github.io/BlenderProc/).
- [BlenderProc Github Repo](https://github.com/DLR-RM/BlenderProc)
- [BlenderProc: Reducing the Reality Gap with Photorealistic Rendering](https://elib.dlr.de/139317/1/denninger.pdf)
- [Documentation](https://dlr-rm.github.io/BlenderProc/)

### The following video provides an overview of the BlenderProc pipeline:

Expand Down
2 changes: 1 addition & 1 deletion chapters/en/unit10/point_clouds.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Now, first we need to understand the formats in which these point clouds are sto
**Why?**

- `point-cloud-utils` supports reading common mesh formats (PLY, STL, OFF, OBJ, 3DS, VRML 2.0, X3D, COLLADA).
- If it can be imported into [MeshLab](https://github.com/cnr-isti-vclab/meshlab), we can read it! (from their readme).
- If it can be imported into [MeshLab](https://github.com/cnr-isti-vclab/meshlab), we can read it! (from their readme)

The type of file is inferred from its file extension. Some of the extensions supported are:

Expand Down
16 changes: 8 additions & 8 deletions chapters/en/unit10/synthetic-lung-images.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,21 @@ The generator has the following model architecture:

- The input is a vector a 100 random numbers and the output is a image of size 128*128*3.
- The model has 4 convolutional layers:
- Conv2D layer.
- Batch Normalization layer.
- ReLU activation.
- Conv2D layer
- Batch Normalization layer
- ReLU activation
- Conv2D layer with Tanh activation.

The discriminator has the following model architecture:

- The input is an image and the output is a probability indicating whether the image is fake or real.
- The model has one convolutional layer:
- Conv2D layer.
- Leaky ReLU activation.
- Conv2D layer
- Leaky ReLU activation
- Three convolutional layers with:
- Conv2D layer.
- Batch Normalization layer.
- Leaky ReLU activation.
- Conv2D layer
- Batch Normalization layer
- Leaky ReLU activation
- Conv2D layer with Sigmoid.

**Data Collection**
Expand Down
8 changes: 4 additions & 4 deletions chapters/en/unit2/cnns/googlenet.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The Inception Module insists on applying convolution filters of different kernel

![inception_naive](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/inception_naive.png)

Figure 1: Naive Inception Module.
Figure 1: Naive Inception Module

As we can see applying multiple convolutions at multiple scales with bigger kernel sizes, like 5x5, can increase the number of parameters drastically. This problem is pronounced as the input feature size (channel size) increases. So as we go deep in the network stacking these "Inception Modules", the computation will increase drastically. The simple solution is to reduce the number of features wherever computational requirements seem to increase. The major pain points of high computation are the convolution layers. The feature dimension is reduced by a computationally inexpensive $1 \times 1$ convolution just before the 3x3 and 5x5 convolution. Let's see it with an example.

Expand All @@ -31,7 +31,7 @@ We would also want to reduce the output features of max pooling before concatena

![inception_reduced](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/inception_reduced.png)

Figure 2: Inception Module.
Figure 2: Inception Module

Also, because of the parallel operations of convolutions at multiple scales, we are ensuring more operations without going deeper into the network, essentially mitigating the vanishing gradient problem.

Expand Down Expand Up @@ -68,14 +68,14 @@ These auxiliary classifiers are removed at inference time. However, minimal gain

![googlenet_aux_clf](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/googlenet_auxiliary_classifier.jpg)

Figure 3: An Auxiliary Classifier.
Figure 3: An Auxiliary Classifier

### Architecture - GoogLeNet

The complete architecture of GoogLeNet is shown in Figure below. All convolutions, including inside the inception block, use ReLU activation. It starts with two convolution(s) and max-pooling blocks. This is followed by a block of two inception modules (3a and 3b) and a max pooling. This follows a block of 5 inception blocks (4a, 4b, 4c, 4d, 4e) and a max pooling after. The auxiliary classifiers are taken out from outputs of 4a and 4d. Two inception blocks follow (5a and 5b). After this, an average pooling and a fully connected layer of 128 units are used.

![googlenet_arch](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/googlenet_architecture.png)
Figure 4: Complete GoogLeNet Architecture.
Figure 4: Complete GoogLeNet Architecture

### Code

Expand Down
10 changes: 5 additions & 5 deletions chapters/en/unit3/vision-transformers/cvt.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ The four main highlights of CvT that helped achieve superior performance and com

Time to get hands-on! Let's explore how to code each major blocks of the CvT architecture in PyTorch shown in the official implementation [[8]](#cvt-imp).

1. Importing required libraries.
1. Importing required libraries

```python
from collections import OrderedDict
Expand All @@ -61,7 +61,7 @@ from einops import rearrange
from einops.layers.torch import Rearrange
```

2. Implementation of **Convolutional Projection**.
2. Implementation of **Convolutional Projection**

```python
def _build_projection(self, dim_in, dim_out, kernel_size, padding, stride, method):
Expand Down Expand Up @@ -121,7 +121,7 @@ The method takes several parameters related to a convolutional layer (such as in

The rearrangement of dimensions is performed using the `Rearrange` operation, which reshapes the input tensor. The resulting projection block is then returned.

3. Implementation of **Convolutional Token Embedding**.
3. Implementation of **Convolutional Token Embedding**

```python
class ConvEmbed(nn.Module):
Expand Down Expand Up @@ -161,7 +161,7 @@ This code defines a ConvEmbed module that performs patch-wise embedding on an in

In summary, this module is designed for patch-wise embedding of images, where each patch is processed independently through a convolutional layer, and optional normalization is applied to the embedded features.

4. Implementation of **Vision Transformer** Block.
4. Implementation of **Vision Transformer** Block

```python
class VisionTransformer(nn.Module):
Expand Down Expand Up @@ -277,7 +277,7 @@ This code defines a Vision Transformer module. Here's a brief overview of the co

- **Forward Method:** The forward method processes the input through the patch embedding, rearranges the dimensions, adds the classification token if present, applies dropout, and then passes the data through the stack of transformer blocks. Finally, the output is rearranged back to the original shape, and the classification token (if present) is separated from the rest of the sequence before returning the output.

5. Implementation of Convolutional Vision Transformer Block (**Hierarchy of Transformers**).
5. Implementation of Convolutional Vision Transformer Block (**Hierarchy of Transformers**)

```python
class ConvolutionalVisionTransformer(nn.Module):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ These models typically receive images (static or frames from videos) as their in

There are a lot of of applications around object detection. One of the most significant examples is in the field of autonomous driving, where object detection is used to detect different objects (like pedestrians, road signs, traffic lights, etc) around the car that become one of the inputs for taking decisions.

To deepen your understanding of the ins-and-outs of object detection, check out our [dedicated chapter](/chapters/en/Unit%206%20-%20Basic%20CV%20Tasks/object_detection.mdx) on Object Detection 🤗.
To deepen your understanding of the ins-and-outs of object detection, check out our [dedicated chapter](/learn/computer-vision-course/unit6/basic-cv-tasks/object_detection) on Object Detection 🤗.

### The Need to Fine-tune Models in Object Detection 🤔

Expand Down
12 changes: 6 additions & 6 deletions chapters/en/unit5/generative-models/gans-vaes/stylegan.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

What you will learn in this chapter:

- What is missing in Vanilla GAN.
- What is missing in Vanilla GAN
- StyleGAN1 components and benefits
- Drawback of StyleGAN1 and the need for StyleGAN2.
- Drawback of StyleGAN2 and the need for StyleGAN3.
- Drawback of StyleGAN1 and the need for StyleGAN2
- Drawback of StyleGAN2 and the need for StyleGAN3
- Use cases of StyleGAN

## What is missing in Vanilla GAN
Expand All @@ -23,9 +23,9 @@ Let us just dive into the special components introduced in StyleGAN that give St

As I already said, StyleGAN only modifies Generator and the Discriminator remains the same, hence it is not mentioned above. Diagram (a) corresponds to the structure of ProgessiveGAN. ProgessiveGAN is just a Vanilla GAN, but instead of generating images of a fixed resolution, it progressively generates images of higher resolution in aim of generating realistic high resolution images, i.e., block 1 of generator generates image of resolution 4 by 4, block 2 of generator generates image of resolution 8 by 8 and so on.
Diagram (b) is the proposed StyleGAN architecture. It has the following main components:
1. A mapping network.
2. AdaIN (Adaptive Instance Normalisation).
3. Concatenation of Noise vector.
1. A mapping network
2. AdaIN (Adaptive Instance Normalisation)
3. Concatenation of Noise vector

Let's break it down one by one.

Expand Down
14 changes: 7 additions & 7 deletions chapters/en/unit8/3d_measurements_stereo_vision.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -114,27 +114,27 @@ Let's focus on a single point - the top left corner of the laptop. As per equati
### Rectified Left and Right Images
We can perform image rectification/post-processing to correct for differences in intrinsic parameters and orientations of the left and right cameras. This process involves performing 3x3 matrix transformations. In the OAK-D Lite API, a stereo node performs these calculations and outputs the rectified left and right images. Details and source code can be viewed [here](https://github.com/luxonis/depthai-experiments/blob/master/gen2-stereo-on-host/main.py). In this specific implementation, correction for intrinsic parameters is performed using intrinsic camera matrices, and correction for orientation is performed using rotation matrices(part of calibration parameters) for the left and right cameras. The rectified left image is transformed as if the left camera had the same intrinsic parameters as the right one. Therefore, in all our following calculations, we'll use the intrinsic parameters for the right camera i.e. focal length of 452.9 and principal point at (298.85, 245.52). In the rectified and stacked images below, notice that the red line at constant v touches the top-left corner of the laptop in both the left and right images.

Rectified Left Image.
Rectified Left Image
![Rectified Left Image](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/3d_stereo_vision_images/rectified_left_frame.jpg?download=true)

Rectified Right Image.
Rectified Right Image
![Rectified Right Image](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/3d_stereo_vision_images/rectified_right_frame.jpg?download=true)

Rectified and Stacked Left and Right Images.
Rectified and Stacked Left and Right Images
![Rectified and Stacked Left and Right Images](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/3d_stereo_vision_images/rectified_stacked_frames.jpg?download=true)

Let's also overlap the rectified left and right images to see the difference. We can see that the v values for different points remain mostly constant in the left and right images. However, the u values change, and this difference in the u values helps us find the depth information for different points in the scene, as shown in Equation 6 above. This difference in 'u' values \\(u\_left - u\_right\\) is called disparity, and we can notice that the disparity for points near the camera is greater compared to points further away. Depth z and disparity \\(u\_left - u\_right\\) are inversely proportional, as shown in equation 6.

Rectified and Overlapped Left and Right Images.
Rectified and Overlapped Left and Right Images
![Rectified and Overlapped Left and Right Images](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/3d_stereo_vision_images/rectified_overlapping_frames.jpg?download=true)

### Annotated Left and Right Rectified Images
Let's find the 3D coordinates for some points in the scene. A few points are selected and manually annotated with their (u,v) values, as shown in the figures below. Instead of manual annotations, we can also use template-based matching, feature detection algorithms like SIFT, etc for finding corresponding points in left and right images.

Annotated Left Image.
Annotated Left Image
![Annotated Left Image](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/3d_stereo_vision_images/annotated_left_img.jpg?download=true)

Annotated Right Image.
Annotated Right Image
![Annotated Right Image](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/3d_stereo_vision_images/annotated_right_img.jpg?download=true)

### 3D Coordinate Calculations
Expand Down Expand Up @@ -168,7 +168,7 @@ We can also compute 3D distances between different points using their (x,y,z) va
| d5(9-10) | 16.9 | 16.7 | 1.2 |
| d6(9-11) | 23.8 | 24 | 0.83 |

Calculated Dimension Results.
Calculated Dimension Results
![Calculated Dimension Results](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/3d_stereo_vision_images/calculated_dim_results.png?download=true)

## Conclusion
Expand Down
14 changes: 7 additions & 7 deletions chapters/en/unit9/tools_and_frameworks.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -100,10 +100,10 @@ For a hands-on guide on how to use the TensorRT, refer this [notebook](https://c

The OpenVINO™ toolkit enables user to optimize a deep learning model from almost any framework and deploy it with best-in-class performance on a range of Intel® processors and other hardware platforms.
The benefits of using OpenVINO includes:
- link directly with OpenVINO Runtime to run inference locally or use OpenVINO Model Server to serve model inference from a separate server or within Kubernetes environment.
- Write an application once, deploy it anywhere on your preferred device, language and OS.
- has minimal external dependencies.
- Reduces first-inference latency by using the CPU for initial inference and then switching to another device once the model has been compiled and loaded to memory.
- link directly with OpenVINO Runtime to run inference locally or use OpenVINO Model Server to serve model inference from a separate server or within Kubernetes environment
- Write an application once, deploy it anywhere on your preferred device, language and OS
- has minimal external dependencies
- Reduces first-inference latency by using the CPU for initial inference and then switching to another device once the model has been compiled and loaded to memory

### Setup guide

Expand Down Expand Up @@ -154,9 +154,9 @@ For a hands-on guide on how to use Optimum for quantization, refer this [noteboo

Edge TPU is Google’s purpose-built ASIC designed to run AI at the edge. It delivers high performance in a small physical and power footprint, enabling the deployment of high-accuracy AI at the edge.
The benefits of using EdgeTPU includes:
- Complements Cloud TPU and Google Cloud services to provide an end-to-end, cloud-to-edge, hardware + software infrastructure for AI-based solutions deployment.
- High performance in a small physical and power footprint.
- Combined custom hardware, open software, and state-of-the-art AI algorithms to provide high-quality, easy to deploy AI solutions for the edge.
- Complements Cloud TPU and Google Cloud services to provide an end-to-end, cloud-to-edge, hardware + software infrastructure for AI-based solutions deployment
- High performance in a small physical and power footprint
- Combined custom hardware, open software, and state-of-the-art AI algorithms to provide high-quality, easy to deploy AI solutions for the edge

For more details on EdgeTPU, see [here](https://cloud.google.com/edge-tpu)

Expand Down

0 comments on commit 5a071d4

Please sign in to comment.