-
Notifications
You must be signed in to change notification settings - Fork 152
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'johko:main' into develop
- Loading branch information
Showing
22 changed files
with
1,068 additions
and
372 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
.PHONY: quality | ||
|
||
# Check code formatting | ||
quality: | ||
python utils/code_formatter.py --check_only |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,156 @@ | ||
# Welcome to the Community Computer Vision Course | ||
|
||
Dear learner, | ||
|
||
Welcome to the community-driven course on computer vision. Computer vision is revolutionizing our world in many ways, from unlocking phones with facial recognition to analyzing medical images for disease detection, enhancing public safety through surveillance systems, monitoring wildlife, and creating new images. Together, we'll dive into the fascinating world of computer vision! | ||
|
||
Throughout this course, we'll cover everything from the basics to the latest advancements in computer vision. It's structured to include various foundational topics, making it friendly and accessible for everyone. We're delighted to have you join us for this exciting journey! | ||
|
||
In this page, you can find how to join the learners community, making a submission and getting a certificate, and more details about the course! | ||
|
||
|
||
## Assignment 📄 | ||
|
||
To obtain your certification for completing the course, complete the following assignments: | ||
|
||
1. Training/fine-tuning a Model | ||
2. Building an application and hosting it on 🤗 Spaces | ||
|
||
### Training/fine-tuning a Model | ||
|
||
There are notebooks under the Notebooks/Vision Transformers section. As of now, we have notebooks for object detection, image segmentation, and image classification. You can either train a model on a dataset that exists on 🤗 Hub or upload a dataset to a dataset repository and train a model on that. | ||
|
||
The model repository needs to have the following: | ||
1. A properly filled Model Card [you can check out here for more information](https://huggingface.co/docs/hub/en/model-cards) | ||
2. If you trained a model with transformers and pushed it to Hub, the model card will be generated. In that case, edit the card and fill in more details. | ||
3. Add the dataset’s ID to the model card to link the model repository to the dataset repository. | ||
|
||
### Creating a Space | ||
|
||
In this assignment section, you'll be building a Gradio-based application for your computer vision model and sharing it on 🤗 Spaces. Learn more about these tasks using the following resources: | ||
|
||
- [Getting started with Gradio](https://huggingface.co/learn/nlp-course/chapter9/1?fw=pt#introduction-to-gradio) | ||
- [How to share your application on 🤗 Spaces](https://huggingface.co/learn/nlp-course/chapter9/4?fw=pt) | ||
|
||
## Certification 🥇 | ||
|
||
Once you've finished the assignments — Training/fine-tuning a Model and Creating a Space — please complete the [form](https://forms.gle/JaSYEf1pEZ4HtNKGA) with your name, email, and links to your model and Space repositories to receive your certificate | ||
|
||
## Join the community! | ||
|
||
We invite you to be a part of [our active and supportive Discord community](http://hf.co/join/discord), where engaging conversations and shared interests flourish every day and where this course started. You will find peers with whom you can exchange ideas and resources. It is your source to collaborate, get feedback, and ask questions! | ||
|
||
It is also a good way to motivate yourself to follow the course. Joining our community is an excellent way to stay engaged. Who knows what is the next thing we will build together? | ||
|
||
As AI continues to advance, so does the quality of our discussions and the diversity of perspectives within our community. Upon becoming a member, you'll have an opportunity to connect with fellow course participants, exchange ideas, and collaborate with others. Moreover, the contributors to this course are active on Discord and might help you when needed. Join us now! | ||
|
||
## Computer Vision Channels | ||
|
||
There are many channels focused on various topics on our Discord server. You will find people discussing papers, organizing events, sharing their projects and ideas, brainstorming, and so much more. | ||
|
||
As a computer vision course learner, you may find the following set of channels particularly relevant: | ||
|
||
* `#computer-vision`: a catch-all channel for everything related to computer vision. | ||
* `#cv-study-group`: a place to exchange ideas, ask questions about specific posts and start discussions. | ||
* `#3d`: a channel to discuss aspects of computer vision specific to 3D computer vision | ||
|
||
If you are interested in generative AI, we also invite you to join all channels related to the Diffusion Models: #core-announcements, #discussions, #dev-discussions, and #diff-i-made-this. | ||
|
||
## What you will learn | ||
|
||
The course is composed of theory, practical tutorials, and engaging challenges. | ||
|
||
* **Theory Part** : This section covers the theoretical principles of computer vision, explained in detail with practical examples. | ||
* **Hands-on Tutorials** : You will learn how to train and apply key computer vision models using Google Colab notebooks. | ||
|
||
To illustrate what these computer vision models can achieve, here is a simple demo of a cat vs. dog classifier created with [Gradio](https://www.gradio.app/). | ||
|
||
<iframe | ||
src="https://huggingface.co/spaces/ak0601/cat_dog_classifier" | ||
frameborder="0" | ||
width="850" | ||
height="450"> | ||
</iframe> | ||
|
||
Throughout this course, we will cover everything from the basics to the latest advancements in computer vision. It is structured to include various foundational topics, giving you a comprehensive understanding of what makes computer vision so impactful today. | ||
|
||
## Pre-requisites | ||
|
||
Before beginning this course, make sure that you have some experience with Python programming and are familiar with transformers, machine learning, and neural networks. If these are new to you, consider reviewing the [first unit of the Hugging Face NLP course](https://huggingface.co/learn/nlp-course/chapter1/3?fw=pt). While a strong knowledge of pre-processing techniques and mathematical operations like convolutions is beneficial, they are not prerequisites. | ||
|
||
|
||
## Course Structure | ||
|
||
The course is organized into multiple units, covering the fundamentals and delving into an in-depth exploration of state-of-the-art models. | ||
|
||
* **Unit 1 - Fundamentals of Computer Vision** : this unit covers the essential concepts to get started with computer vision: the need for computer vision, the field's basics, and its applications. Explore image fundamentals, formation, and preprocessing, along with key aspects of feature extraction. | ||
* **Unit 2 - Convolutional Neural Networks (CNNs)** : delve into the world of CNNs, understanding their general architecture, key concepts, and common pre-trained models. Learn how to apply transfer learning and fine-tuning to adapt CNNs for various tasks. | ||
* **Unit 3 - Vision Transformers** : explore transformer architecture in the context of computer vision and learn how they compare to CNNs. Understand common vision transformers such as Swin, DETR, and CVT, along with techniques for transfer learning and fine-tuning. | ||
* **Unit 4 - Multimodal Models** : understand the fusion of text and vision by exploring multimodal tasks like image-to-text and text-to-image. Study models such as CLIP and its relatives (GroupViT, BLIPM, Owl-VIT), and master transfer learning techniques for multimodal tasks. | ||
* **Unit 5 - Generative Models** : explore generative models, including GANs, VAEs, and diffusion models. Learn about their differences and applications in tasks such as text-to-image, image-to-image, and inpainting. | ||
* **Unit 6 - Basic Computer Vision Tasks** : cover fundamental tasks like image classification, object detection, and segmentation and the models used in them (YOLO, SAM). Gain insights into metrics and practical applications for these tasks. | ||
* **Unit 7 - Video and Video Processing** : examine the characteristics of videos, the role of video processing, and the challenges compared to image processing. Explore temporal continuity, motion estimation, and practical applications in video processing. | ||
* **Unit 8 - 3D Vision, Scene Rendering, and Reconstruction** : delve into the complexities of three-dimensional vision, exploring concepts like Nerf and GQN for scene rendering and reconstruction. Understand the challenges and applications of 3D vision in computer vision, and how it provides an even more comprehensive view of spatial information. | ||
* **Unit 9 - Model Optimization** : explore the critical aspects of model optimization. Cover techniques such as model compression, deployment considerations, and the usage of tools and frameworks. Include topics topics like distillation, pruning, and TinyML for efficient model deployment. | ||
* **Unit 10 - Synthetic Data Creation** : discover the importance of synthetic data creation using deep generative models. Explore methods like point clouds and diffusion models and investigate major synthetic datasets and their applications in computer vision. | ||
* **Unit 11 - Zero Shot Computer Vision** : delve into the realm of zero-shot learning in computer vision, covering aspects of generalization, transfer learning, and its applications in tasks such as zero-shot recognition and image segmentation. Explore the relationship between zero-shot learning and transfer learning across various computer vision domains. | ||
* **Unit 12 - Ethics and Biases in Audio and Computer Vision** : understand the ethical considerations specific to computer vision. Explore why ethics matter, how biases can infiltrate AI models, and the types of biases prevalent in these domains. Learn how to do bias evaluation and mitigation strategies, emphasizing responsible development and deployment of AI technologies. | ||
* **Unit 13 - Outlook and Emerging Trends** : explore current trends and emerging architectures . Delve into innovative approaches like Retentive Network, Hiera, Hyena, I-JEPA, and Retention Vision Models. | ||
|
||
## Meet our team | ||
|
||
This is made by the Hugging Face Community with love! Our goal was to create a computer vision course that is beginner-friendly and that could act as a resource for others. Around 60+ people from all over the world joined forces to make this project happen. Here we give them credit: | ||
|
||
**Unit 1 - Fundamentals of Computer Vision** | ||
- Reviewers: [Ratan Prasad](https://github.com/ratan), [Ameed Taylor](https://github.com/atayloraerospace) | ||
- Writers: [Seshu Pavan Mutyala](https://github.com/seshu-pavan), [Isabella Bicalho-Frazeto](https://github.com/bellabf), [Aman Kapoor](https://github.com/aman06012003), [Tiago Comassetto Fróes](https://github.com/froestiago), [Aditya Mishra](https://github.com/adityaiiitr), [Kerem Delikoyun](https://github.com/krmdel), [Ker Lee Yap](https://github.com/klyap), [Kathy Fahnline](https://github.com/kfahn22), [Ameed Taylor](https://github.com/atayloraerospace), [Kathy Fahnline](https://github.com/kfahn22) | ||
|
||
**Unit 2 - Convolutional Neural Networks (CNNs)** | ||
- Reviewers: [Mohammed Hamdy](https://github.com/mmhamdy), [Sezan](https://github.com/sezan92), [Joshua Adrian Cahyono](https://github.com/JvThunder), [Murtaza Nazir](https://github.com/themurtazanazir), [Albert Kao](https://github.com/albertkao227), [Sitam Meur](https://github.com/sitamgithub-MSIT) | ||
- Writers: [Emre Albayrak](https://github.com/emre570), [Caroline Shamiso Chitongo](https://github.com/ShamieCC), [Sezan](https://github.com/sezan92), [Joshua Adrian Cahyono](https://github.com/JvThunder), [Murtaza Nazir](https://github.com/themurtazanazir), [Albert Kao](https://github.com/albertkao227), [Isabella Bicalho-Frazeto](https://github.com/bellabf), [Aman Kapoor](https://github.com/aman06012003), [Sitam Meur](https://github.com/sitamgithub-MSIT) | ||
|
||
**Unit 3 - Vision Transformers** | ||
- Reviewers: [Ratan Prasad](https://github.com/ratan), [Mohammed Hamdy](https://github.com/mmhamdy), [Ameed Taylor](https://github.com/atayloraerospace), [Sezan](https://github.com/sezan92) | ||
- Writers: [Surya Guthikonda](https://github.com/SuryaKrishna02), [Ker Lee Yap](https://github.com/klyap), [Anindyadeep Sannigrahi](https://bento.me/anindyadeep), [Celina Hanouti](https://github.com/hanouticelina), [Malcolm Krolick](https://github.com/Mkrolick) | ||
|
||
**Unit 4 - Multimodal Models** | ||
- Reviewers: [Ratan Prasad](https://github.com/ratan), [Snehil Sanyal](https://github.com/snehilsanyal), [Mohammed Hamdy](https://github.com/mmhamdy), [Charchit Sharma](https://github.com/charchit7), [Ameed Taylor](https://github.com/atayloraerospace), [Isabella Bicalho-Frazeto](https://github.com/bellabf) | ||
- Writers: [Snehil Sanyal](https://github.com/snehilsanyal), [Surya Guthikonda](https://github.com/SuryaKrishna02), [Mateusz Dziemian](https://github.com/mattmdjaga), [Charchit Sharma](https://github.com/charchit7), [Evstifeev Stepan](https://github.com/minemile), [Jeremy Kespite](https://github.com/jeremy-k3/), [Isabella Bicalho-Frazeto](https://github.com/bellabf) | ||
|
||
**Unit 5 - Generative Models** | ||
- Reviewers: [Ratan Prasad](https://github.com/ratan), [William Bonvini](https://github.com/WilliamBonvini), [Mohammed Hamdy](https://github.com/mmhamdy), [Ameed Taylor](https://github.com/atayloraerospace)- | ||
- Writers: [Jeronim Matijević](github.com/jere357), [Mateusz Dziemian](https://github.com/mattmdjaga ), [Charchit Sharma](https://github.com/charchit7) | ||
|
||
**Unit 6 - Basic Computer Vision Tasks** | ||
- Reviewers: [Adhi Setiawan](https://github.com/adhiiisetiawan) | ||
- Writers: [Adhi Setiawan](https://github.com/adhiiisetiawan) | ||
|
||
**Unit 7 - Video and Video Processing** | ||
- Reviewers: [Ameed Taylor](https://github.com/atayloraerospace) | ||
- Writers: [Diwakar Basnet](https://github.com/DiwakarBasnet) | ||
|
||
**Unit 8 - 3D Vision, Scene Rendering, and Reconstruction** | ||
- Reviewers: [Ratan Prasad](https://github.com/ratan), [William Bonvini](https://github.com/WilliamBonvini), [Mohammed Hamdy](https://github.com/mmhamdy), [Adhi Setiawan](https://github.com/adhiiisetiawan), [Ameed Taylor](https://github.com/atayloraerospace0) | ||
- Writers: [John Fozard](https://github.com/jfozard), [Vasu Gupta](https://github.com/vasugupta9) | ||
|
||
**Unit 9 - Model Optimization** | ||
- Reviewers: [Ratan Prasad](https://github.com/ratan), [Mohammed Hamdy](https://github.com/mmhamdy), [Adhi Setiawan](https://github.com/adhiiisetiawan), [Ameed Taylor](https://github.com/atayloraerospace) | ||
- Writer: [Adhi Setiawan](https://github.com/adhiiisetiawan) | ||
|
||
**Unit 10 - Synthetic Data Creation** | ||
- Reviewers: [Mohammed Hamdy](https://github.com/mmhamdy), [Ameed Taylor](https://github.com/atayloraerospace), [Bhavesh Misra](https://github.com/Zekrom-7780), [Kathy Fahnline](https://github.com/kfahn22) | ||
- Writers: [William Bonvini](https://github.com/WilliamBonvini), [Alper Balbay](https://github.com/alperiox), [Madhav Kumar](https://github.com/miniMaddy), [Bhavesh Misra](https://github.com/Zekrom-7780) | ||
|
||
**Unit 11 - Zero Shot Computer Vision** | ||
- Reviewers: [Mohammed Hamdy](https://github.com/mmhamdy), [Albert Kao](https://github.com/albertkao227) | ||
- Writers: [Mohammed Hamdy](https://github.com/mmhamdy), [Albert Kao](https://github.com/albertkao227) | ||
|
||
**Unit 12 - Ethics and Biases in Audio and Computer Vision** | ||
- Reviewers: [Ratan Prasad](https://github.com/ratan), [Mohammed Hamdy](https://github.com/mmhamdy), [Charchit Sharma](https://github.com/charchit7), [Adhi Setiawan](https://github.com/adhiiisetiawan), [Ameed Taylor](https://github.com/atayloraerospace), [Bhavesh Misra](https://github.com/Zekrom-7780) | ||
- Writers: [Snehil Sanyal](https://github.com/snehilsanyal), [Bhavesh Misra](https://github.com/Zekrom-7780) | ||
|
||
**Unit 13 - Outlook and Emerging Trends** | ||
- Reviewers: [Ratan Prasad](https://github.com/ratan), [Ameed Taylor](https://github.com/atayloraerospace), [Mohammed Hamdy](https://github.com/mmhamdy) | ||
- Writers: [Farros Alferro](https://github.com/farrosalferro), [Mohammed Hamdy](https://github.com/mmhamdy), [Louis Ulmer](https://github.com/lulmer), [Dario Wisznewer](https://github.com/dariowsz), [gonzachiar](https://github.com/gonzachiar) | ||
|
||
We are happy to have you here, let's get started! |
Oops, something went wrong.