Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp of lesson structure + content #40

Merged
merged 126 commits into from
Sep 25, 2024

Conversation

mike-ivs
Copy link
Contributor

@mike-ivs mike-ivs commented May 3, 2023

Hi Team! (the repo looked a bit quiet... I hope this hasn't gone stale! <3 )

We recently ran a "carpentries style" Introduction to Python/ML/DL workshop for which we included this incubator lesson (over other pre-alpha/alpha carpentry incubators) alongside Novice-inflammation and Intro-to-Deep-learning (incubator in Beta).

We were a bit surprised that there is no formal "intro to ML" lesson in the carpentries and so we decided (as others have #37 and here) to pick this incubator lesson as the most established/suited and make a few further changes to content and structure before we delivered.

Now that we've made and delivered the first bunch of these changes we thought it would be useful to feed them back into the lesson and community to get some wider feedback and hopefully help the carpentries get an established "intro to ML" lesson.

I've submitted our changes all at once and will summarise them below in a bit more detail. I'm happy to re-submit them in smaller, by-episode chunks if that is easier for you.

Changes

Overall structure

We've adjusted the overall structure of the lesson to give you a more balanced overview of supervised and unsupervised learning, with examples of regression, classification(new), clustering, and dimension reduction.

For each of those episodes we made sure to show and compare two different techniques to give a flavour of the topics:

  • regression - linear vs polynomial
  • classification - Decision tree vs SVM
  • clustering - k-means vs spectral
  • dim red - PCA vs t-SNE

We also tried to reduce the conceptual overhead for ML / gradually introduce concepts as the lesson progressed:

  • in ep.1 we touch on "what if we compare against new data" and in ep.2 we introduce train-test splits
  • in ep.2 we touch on "over-fitting vs model complexity" and in ep.3 we play more with hyper-parameters

We also made some tweaks across the whole lesson to improve text flow/clarity/formatting, and added in a few more figures / more plotting code to help reinforce things with the visual aspect of learning.

Introduction

We overhauled the introduction to give a clearer explanation of:

  • what is machine learning
  • where is it used in our daily lives
  • AI vs ML vs DL (very similar to the intro-to-DL lesson, shameless figure reuse)
  • Types of machine learning; summary of which are covered in the lesson
  • limitations of ML

We removed the "over hyping" section as, while it may be true that ML/AI is overhyped, it felt like a bit too negative of a tone to take for an introduction to the topic.

Regression

We decided to remove the "create your own python regression" lesson in favour of using purely SKlearn by combining the two regression lessons into one. We needed extra time to teach classification, and while I understand the reasoning behind doing a manual regression before using SKlearn it felt like quite a time sink to not use it in a lesson about "ML with SKlearn".

We added in a quick section to introduce Supervised learning and Sklearn before moving onto regression. We also used a small test dataset instead of the gapminder dataset (as done by #39 ) to try and reduce the learner burden of having to understand the dataset alongside learning ML for the first time. (maybe it's too small of a dataset...)

Classification

This one felt like it was missing from the original! We made a quick classification lesson, based upon the same penguin dataset as the "intro-to-DL" lesson. It steps up the complexity of the coding from a simple 2-list dataset, but it feels like a nice intermediate between the regression lesson and the eventual "intro to DL" lesson.

Clustering

We added in a section to explain the idea of unsupervised learning, touched a little on the concept of hyper-parameters, and broke up the code to make a few more plots to give bit more of a visualisation of the clustering process.

Dimension reduction

We expanded this section to try and give a better overview of the MNIST dataset and the higher dimensionality of these images. We also tried to give a better explanation of PCA, though have only just glanced through #39 it would be worth including some of those changes into the lesson!

Neural Networks

We left this section mostly unchanged (apart from minor grammar/flow changes). Given that we ran "Intro to ML" AND "intro to DL" we actually left the NN section to the "Intro to DL" part of our workshop, in favour of covering the classical learning in ML.

My two cents on the direction of development

Given the advanced development of the "intro to DL" lesson it might be worth dropping the NN section of this lesson and instead focusing on Ensemble learning and/or Reinforcement learning in future expansions of this lesson - they seem to be the only big "ML" topics that aren't covered whereas NNs are a mandatory concept for the "intro to DL"

Thanks for all the effort put in so far, and happy to discuss this PR :)

mike-ivs and others added 30 commits February 10, 2023 16:53
"Episode 05 - Dimensionality reduction has been completed. figures pca.svg, tsne.svg, MnistExamples.png is added"
Classification lesson 1st draft
converted jupyter with jupytext to markdown
Update with new changes from Mikes repo
Toms tweaks to the lesson text
@mike-ivs
Copy link
Contributor Author

Closing for now due to significant changes

@mike-ivs mike-ivs closed this Jul 30, 2024
@mike-ivs
Copy link
Contributor Author

Reopening after a chat with Colin :)

I'll go through and make a summary of all the changes we've done along the way, a combination of the initial changes we mentioned in the PR and all the additional changes we built upon those.

The new lesson can be previewed here - https://mike-ivs.github.io/machine-learning-novice-sklearn/

@mike-ivs mike-ivs reopened this Sep 23, 2024
@mike-ivs
Copy link
Contributor Author

Overall structure

We've adjusted the overall structure of the lesson to give a broad overview of basic ML: what ML is (vs DL+AI), supervised vs non-supervised, regression, classification, clustering, dimensionality reduction, and ensemble learning.

For each of those episodes we made sure to show and compare two different techniques to give a flavour of the topics:

  • regression: linear vs polynomial
  • classification: Decision tree vs SVM
  • ensemble: Bagging vs Stacking
  • clustering: k-means vs spectral
  • dimensionality reduction: PCA vs t-SNE

We also tried to reduce the conceptual overhead for ML / gradually introduce concepts as the lesson progressed:

  • in ep.1 we introduce the general "ML/DL" workflow, fit some data, and ease towards the concept of overfitting on a data subset.
  • in ep.2 we introduce "train-test-split" and the concept of hyper parameters.
  • in ep.3 we build on regression/classification using Ensemble techniques (random forest)
  • in ep.4 build on the concept of hyper parameters and introduce the idea of performance (tradeoffs)
  • in ep.5 we look at larger/complex data, and frame dimensionality reduction as a useful step prior to other ML techniques.

We've tried to function'ise the code as much as possible, the idea being we slowly go through the process of creating reusable workflow functions before putting them into practice multiple times (new data, hyperparameter changes, etc) i.e. teaching the underlying workflow before practicing doing it a few times.

We've also tried to keep the datasets as "built-in" as possible to reduce any prep-overhead prior to teaching a workshop.

@colinsauze colinsauze merged commit b77f3ca into carpentries-incubator:gh-pages Sep 25, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants