Merge remote-tracking branch 'remote_repo/main' into main

johko · May 15, 2024 · 9f775d5 · 9f775d5
2 parents 63c5921 + f2fa020
commit 9f775d5
Show file tree

Hide file tree

Showing 20 changed files with 217 additions and 124 deletions.
diff --git a/chapters/en/_toctree.yml b/chapters/en/_toctree.yml
@@ -42,6 +42,8 @@
     local: "unit2/cnns/intro-transfer-learning"
   - title: Lets Dive Further with MobileNet
     local: "unit2/cnns/mobilenetextra"
+  - title: Resnet 
+    local: "unit2/cnns/resnet"
 - title: Unit 3 - Vision Transformers
   sections:
   - title: Vision Transformers for Image Classification

diff --git a/chapters/en/unit1/chapter1/applications.mdx b/chapters/en/unit1/chapter1/applications.mdx
@@ -38,7 +38,7 @@ Medical image analysis involves the application of computer vision and machine l
 
 - **Diagnostic Assistance**: Computer vision aids in diagnosing diseases and conditions by analyzing medical images. For instance, in radiology, algorithms can detect abnormalities such as tumors and fractures in X-rays or MRIs. These systems assist healthcare professionals by highlighting areas of concern or providing quantitative data that helps decision-making.
 
-- **Segmentation and Detection:**: Medical image analysis involves segmenting and detecting specific structures or anomalies within the images. This process helps isolate organs, tissues, or pathologies for closer examination. For example, in cancer detection, computer vision algorithms can segment and analyze tumors from MRI or CT scans, assisting in treatment planning and monitoring.
+- **Segmentation and Detection**: Medical image analysis involves segmenting and detecting specific structures or anomalies within the images. This process helps isolate organs, tissues, or pathologies for closer examination. For example, in cancer detection, computer vision algorithms can segment and analyze tumors from MRI or CT scans, assisting in treatment planning and monitoring.
 
 - **Treatment Planning and Monitoring**: Computer vision contributes to treatment planning by providing precise measurements, tracking changes over time, and assisting in surgical planning. It helps doctors understand the extent and progression of a disease, enabling them to plan and adjust treatment strategies accordingly. Doctors were already capable of doing most of these tasks, but they needed to do them by hand. CV systems can do it automatically, which frees us doctors to do other tasks.
 

diff --git a/chapters/en/unit1/chapter1/definition.mdx b/chapters/en/unit1/chapter1/definition.mdx
@@ -14,7 +14,7 @@ Computer vision is the science and technology of making machines see. It involve
 
 The evolution of computer vision has been marked by a series of incremental advancements in and across its interdisciplinary fields, where each step forward gave rise to breakthrough algorithms, hardware, and data, giving it more power and flexibility. One such leap was the jump to the widespread use of deep learning methods.
 
-Initially, to extract and learn information in an image, you extract features through image-preprocessing techniques (chapter 3). Once you have a group of features describing your image, you use a classical machine learning algorithm on your dataset of features. It is a strategy that already simplifies things from the hard-coded rules, but it still relies on domain knowledge and exhaustive feature engineering. A more state-of-the-art approach arises when deep learning methods and large datasets meet. Deep learning (DL) allows machines to automatically learn complex features from the raw data. This paradigm shift allowed us to build more adaptive and sophisticated models, causing a renaissance in the field.
+Initially, to extract and learn information in an image, you extract features through image-preprocessing techniques (Pre-processing for Computer Vision Tasks). Once you have a group of features describing your image, you use a classical machine learning algorithm on your dataset of features. It is a strategy that already simplifies things from the hard-coded rules, but it still relies on domain knowledge and exhaustive feature engineering. A more state-of-the-art approach arises when deep learning methods and large datasets meet. Deep learning (DL) allows machines to automatically learn complex features from the raw data. This paradigm shift allowed us to build more adaptive and sophisticated models, causing a renaissance in the field.
 
 The seeds of computer vision were sown long before the rise of deep learning models during 1960's, pioneers like David Marr and Hans Moravec wrestled with the fundamental question: Can we get machines to see? Early breakthroughs like edge detection algorithms, object recognition were achived with a mix of cleverness and brute-force which laid the ground work for this developing computer vision systems. Over time, as research and development advanced and hardware capabilities improved, the computer vision community expanded exponentially. This vibrant community is composed of researchers,engineers, data scientists, and passionate hobbyists across the globe coming from a vast arrayof disciplines. With open-source and community driven projects we are witnessing democratized access to cutting-edge tools and technologies helping to create a renaissance in this field.
 
@@ -63,5 +63,5 @@ You will read more about the core tasks of computer vision in the Computer Visio
 The complexity of a given task in the realm of image analysis and computer vision is not solely determined by how noble or difficult a question or task may seem to an informed audience. Instead, it primarily hinges on the properties of the image or data being analyzed. Take, for example, the task of identifying a pedestrian in an image. To a human observer, this might appear straightforward and relatively simple, as we are adept at recognizing people. However, from a computational perspective, the complexity of this task can vary significantly based on factors such as lighting conditions, the presence of occlusions, the resolution of the image, and the quality of the camera. In low-light conditions or with pixelated images, even the seemingly basic task of pedestrian detection can become exceedingly complex for computer vision algorithms,requiring advanced image enhancement and machine learning techniques. Therefore, the challenge in image analysis and computer vision often lies not in the inherent nobility of a task, but in the intricacies of the visual data and the computational methods required to extract meaningful insights from it.
 
 ## Link to computer vision applications
-As a field, computer vision has a growing importance in society. There are many ethical considerations regarding its applications. For example, a model that is deployed to detect cancer can have terrible consequences if it classifies a cancer sample as healthy. Surveillance technology, such as models that are capable of tracking people, also raises a lot of privacy concerns. This will be discussed in detail in Chapter 14- Applications of Computer Vision and real-world Use Cases, but we will give you a taste of some of its applications.
+As a field, computer vision has a growing importance in society. There are many ethical considerations regarding its applications. For example, a model that is deployed to detect cancer can have terrible consequences if it classifies a cancer sample as healthy. Surveillance technology, such as models that are capable of tracking people, also raises a lot of privacy concerns. This will be discussed in detail in "Unit 12 - Ethics and Biases". We will give you a taste of some of its cool applications in "Applications of Computer Vision".
 
diff --git a/chapters/en/unit1/chapter1/motivation.mdx b/chapters/en/unit1/chapter1/motivation.mdx
@@ -14,7 +14,7 @@ If you ever spontaneously kicked a ball, your brain performs a myriad of tasks u
 
 Shockingly, we don't need any formal education for this. We don't attend classes for most of the decisions we make daily. No mental math 101 can estimate the foot strength required for kicking a ball. We learned that from trial and error growing up. And some of us might never have learned at all. This is a striking contrast to the way we built programs. Programs are mostly rule-based.
 
-Let’s try to replicate just the first task that our brain did: detecting that there is a ball. One way to do it is to define what a ball is and then exhaustively search for one in the image. Defining what a ball is is actually difficult. Balls can be as small as tennis balls but as big as Zorb balls, so size won’t help us much. We could try to describe its shape, but some balls, like rugby, are not always perfectly spherical. Not everything spherical is a ball either, otherwise ranges, bubbles, candies, and even our planet would all be considered balls.
+Let’s try to replicate just the first task that our brain did: detecting that there is a ball. One way to do it is to define what a ball is and then exhaustively search for one in the image. Defining what a ball is is actually difficult. Balls can be as small as tennis balls but as big as Zorb balls, so size won’t help us much. We could try to describe its shape, but some balls, like rugby, are not always perfectly spherical. Not everything spherical is a ball either, otherwise bubbles, candies, and even our planet would all be considered balls.
 
 <div class="flex justify-center">
     <img src="https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/balls.png" alt="Balls">

diff --git a/chapters/en/unit1/feature-extraction/feature-matching.mdx b/chapters/en/unit1/feature-extraction/feature-matching.mdx
@@ -24,7 +24,7 @@ import numpy as np
 Let's start by initializing SIFT detector.
 
 ```python
-sift = cv.SIFT_create()
+sift = cv2.SIFT_create()
 ```
 
 Find the keypoints and descriptors with SIFT.
@@ -37,7 +37,7 @@ kp2, des2 = sift.detectAndCompute(img2, None)
 Find matches using k nearest neighbors.
 
 ```python
-bf = cv.BFMatcher()
+bf = cv2.BFMatcher()
 matches = bf.knnMatch(des1, des2, k=2)
 ```
 
@@ -53,8 +53,8 @@ for m, n in matches:
 Draw the matches.
 
 ```python
-img3 = cv.drawMatchesKnn(
-    img1, kp1, img2, kp2, good, None, flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS
+img3 = cv2.drawMatchesKnn(
+    img1, kp1, img2, kp2, good, None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS
 )
 ```
 
@@ -67,7 +67,7 @@ img3 = cv.drawMatchesKnn(
 Initialize the ORB descriptor.
 
 ```python
-orb = cv.ORB_create()
+orb = cv2.ORB_create()
 ```
 
 Find keypoints and descriptors.
@@ -81,7 +81,7 @@ Because ORB is a binary descriptor, we find matches using [Hamming Distance](htt
 which is a measure of the difference between two strings of equal length.
 
 ```python
-bf = cv.BFMatcher(cv.NORM_HAMMING, crossCheck=True)
+bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
 ```
 
 We will now find the matches.
@@ -99,14 +99,14 @@ matches = sorted(matches, key=lambda x: x.distance)
 Draw first n matches.
 
 ```python
-img3 = cv.drawMatches(
+img3 = cv2.drawMatches(
     img1,
     kp1,
     img2,
     kp2,
     matches[:n],
     None,
-    flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS,
+    flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS,
 )
 ```
 
@@ -140,7 +140,7 @@ search_params = dict(checks=50)
 Initiate SIFT detector.
 
 ```python
-sift = cv.SIFT_create()
+sift = cv2.SIFT_create()
 ```
 
 Find the keypoints and descriptors with SIFT.
@@ -156,7 +156,7 @@ We will now define the FLANN parameters. Here, trees is the number of bins you w
 FLANN_INDEX_KDTREE = 1
 index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
 search_params = dict(checks=50)
-flann = cv.FlannBasedMatcher(index_params, search_params)
+flann = cv2.FlannBasedMatcher(index_params, search_params)
 
 matches = flann.knnMatch(des1, des2, k=2)
 ```
@@ -182,10 +182,10 @@ draw_params = dict(
     matchColor=(0, 255, 0),
     singlePointColor=(255, 0, 0),
     matchesMask=matchesMask,
-    flags=cv.DrawMatchesFlags_DEFAULT,
+    flags=cv2.DrawMatchesFlags_DEFAULT,
 )
 
-img3 = cv.drawMatchesKnn(img1, kp1, img2, kp2, matches, None, **draw_params)
+img3 = cv2.drawMatchesKnn(img1, kp1, img2, kp2, matches, None, **draw_params)
 ```
 
 ![FLANN](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/feature-extraction-feature-matching/FLANN.png)

diff --git a/chapters/en/unit1/feature-extraction/feature_description.mdx b/chapters/en/unit1/feature-extraction/feature_description.mdx
@@ -1,6 +1,6 @@
 # Feature Description
 
-Features are attributes of the instances learnt by the model to be later used to recognize new instances.
+Features are attributes of the instances learned by the model to be later used to recognize new instances.
 
 ## How Can We Represent Features In Data Structures?
 

diff --git a/chapters/en/unit1/image_and_imaging/extension-image.mdx b/chapters/en/unit1/image_and_imaging/extension-image.mdx
@@ -6,7 +6,7 @@ The litter of kittens is a simple story, but it reflects why it is so hard to im
 
 ![Cat kisses showing distortion based on the distance from the object](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/cat_kiss.gif)
 
-It is tempting to think that if we just had a better camera, one that responds more rapidly with a high resolution all would be solved. We would get the adorable pictures we want. Moreover, we will use the knowledge in this course to do more than just capture all of the adorable kittens, we will want to build a model on a nanny cam that checks if the kittens are still together with their mommy so we know they are all safe and sound. Sounds perfect, right? 
+It is tempting to think that if we just had a better camera, one that responds more rapidly with a high resolution and then all would be solved. We would get the adorable pictures we want. Moreover, we will use the knowledge in this course to do more than just capture all of the adorable kittens, we will want to build a model on a nanny cam that checks if the kittens are still together with their mommy so we know they are all safe and sound. Sounds perfect, right? 
 
 Before we go out to buy the newest flashiest new camera in the market thinking we will have better data. It will be super easy to train a model. We will have a super-accurate model. Out-of-this-world performance on the kitten tracking market. This paragraph is here to guide you in a more productive direction and possibly save you a lot of time and money. A higher resolution is not the answer to all your problems. For starters, a typical neural network model for dealing with images is a convolution neural network (CNN). CNNs expect an image of a given size.  A large image needs a large model. Training will take a longer time. Chances are that your computers are also limited in RAM. A larger image size will mean fewer images to train on because the RAM will be limited for each iteration.
 
@@ -32,11 +32,11 @@ To see more than what Mother Nature has given us, we need sensors capturing beyo
 
 We then directed our colossal lenses outwards toward the sky, using them to envision what was once unseen and unknown.  We also pointed them out to the minuscule realm by building images of the DNA structure and individual atoms. Both of these instruments operate on the idea of manipulating light. We use different types of mirrors or lenses, bend and focus light in the specific ways we are interested in.
 
-We are so obsessive about seeing things that scientists have even changed the DNA sequence of certain animals so they can tag proteins of interest with a special type of protein, called green fluorescence protein. As the name suggests, when a green wavelength of light illuminates the sample, the GFP emits a fluorescent signal back.  Now, it is easier to know where the protein of interest is being expressed because scientists can image it.
+We are so obsessive about seeing things that scientists have even changed the DNA sequence of certain animals so they can tag proteins of interest with a special type of protein (green fluorescence protein, GFP). As the name suggests, when a green wavelength of light illuminates the sample, the GFP emits a fluorescent signal back.  Now, it is easier to know where the protein of interest is being expressed because scientists can image it.
 
 After that, it was a matter of improving this system to get more channels in place, in a longer timescale, in a better resolution. A great example of this is how microscopes now generate terabytes of data overnight. 
 
-A great example of this combined effort is the video below. In it, you see the time lapse of the projection of the 3D image of a developping embryo of a fished tagged a fluorescent protein. Each colored dot you see on the image represents an individual cell.
+A great example of this combined effort is the video below. In it, you see the time lapse of the projection of the 3D image of a developping embryo of a fished tagged with a fluorescent protein. Each colored dot you see on the image represents an individual cell.
 
 ![Fisho Embryo Image adapted from https://www.biorxiv.org/content/10.1101/2023.03.06.531398v2.supplementary-material ](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/fish.gif)
 
@@ -96,4 +96,4 @@ That is not the only scenario where the coordinates system comes into play. Anot
 
 Lastly, image acquisition comes with its own set of biases. We can loosely define bias here as an undesired characteristic of the dataset, either because it is noise or because it changes the model behavior. There are many sources of bias, but a relevant one in image acquisition is measurement bias. Measurement bias happens when the dataset used to train your model varies too much from the dataset that your model actually sees, like our previous example of a high-resolution kitten image and the nanny cam. There can be other sources of measurement bias, such as the measurement coming from the labelers themselves (i.e different groups and different people label images differently), or from the context of the image (i.e. in trying to classify dogs and cats, if all the pictures of cats are on the sofa, the model might learn to distinguish sofa from non-sofa instead of cats and dogs). 
 
-All of that is to say that recognizing and addressing the characteristics of images originating from different instruments is a good first step into building a computer vision model. Preprocessing techniques and strategies to address the problems we identify in this case can be used to mitigate its impact on the model. The next chapter will delve deeper into specific preprocessing methods used to enhance model performance.
+All of that is to say that recognizing and addressing the characteristics of images originating from different instruments is a good first step into building a computer vision model. Preprocessing techniques and strategies to address the problems we identify in this case can be used to mitigate its impact on the model. The "Preprocessing for Computer Vision Tasks" chapter will address deeper into specific preprocessing methods used to enhance model performance.
diff --git a/chapters/en/unit1/image_and_imaging/image.mdx b/chapters/en/unit1/image_and_imaging/image.mdx
@@ -35,11 +35,11 @@ If you've been tuned in,  you may have caught on to the idea that videos are a v
 
 Images can naturally have a hidden component in time. They are, after all, taken at a specific point in time, and different images may be related in time, too. However, images and videos differ in how they sample this temporal information. An image is a static representation at a single point in time, while a video is a sequence of images played at a rate that creates an illusion of motion. This rate is what we can call frames per second. 
 
-This is so fundamental, that this course has a dedicated chapter to video. There, we will go over the adaptions required to deal with this added dimension.
+This is so fundamental, that this course has a dedicated chapter to video. There, we will go over the adaptations required to deal with this added dimension.
 
 ### Images vs Tabular Data
 
-In tabular data, dimensionality is usually defined by the number of features (columns) describing one data point. In visual data, dimensionality usually refers to the number of dimensions that describe your data. For a 2D image, we usually refer to numbers \\(x_i\\) and \\(Y_i\\) as the image size.  
+In tabular data, dimensionality is usually defined by the number of features (columns) describing one data point. In visual data, dimensionality usually refers to the number of dimensions that describe your data. For a 2D image, we usually refer to numbers \\(x_i\\) and \\(y_i\\) as the image size.  
 
  Another aspect is the generation of features that describe visual data. They are generated by traditional preprocessing or learned through deep learning methods. We refer to this as feature extraction. It involves different algorithms discussed in more detail in the feature extraction chapter. It contrasts with the feature engineering for tabular data, where new features are built upon existing ones.