Skip to content

Creating and implementing a k-nearest neighbours algorithm from scratch in Python, with a goal to understand, not optimise. No machine machine learning libraries were used in the process of creating the algorithm.

Notifications You must be signed in to change notification settings

juanlucasumali/KNN-from-scratch

Repository files navigation

Building A KNN From Scratch!

This is an image
(Depicted: A simple data visual I created using matplotlib!)


🤔 Purpose

Why re-invent the wheel? Well, for a while now, I've been wanting to learn more about how to apply machine learning models to my personal coding projects. However, my first few attempts at using ML libraries (like scikit-learn) WITHOUT a tutorial didn't work out. It was at that moment that knew I knew I was way out of my depth. So, by creating and implementing ML models from scratch (starting with a simple KNN algorithm), I hope to un-black box these models and gain a more fundamental understanding of the mathematical and statistical concepts behind them. As a result, I may hopefully apply them to novel use cases and tweak them as needed to best fit future scenarios.


⚙️ Process

  1. WITHOUT looking at any code, I scoured the internet for articles and videos which included detailed yet digestable explanations of the KNN formula. Sources I referenced heavily include this article and this video.
  2. Opened up Jupyter notebook and began coding my first simple KNN algorithm, which consisted of a Euclidian distance calculator and could only handle datasets with two features, one target, and two classes.
  3. Learned to use the pandas library in order to manipulate csv files using python. A key takeaway was learning how to translate categorial target values into numerical ones in order for the KNN algorithm to properly read the dataset.
  4. Learned to use the matplotlib plotting library in order to better visualize and understand the KNN algorithm's workings from another angle.
  5. Tested my model on a classic use case scenario: iris speciies classification! I used this datast taken from Kaggle.
  6. After further readings, I learned to implement additional features to my model, such as a function that checked the accuracy of the algorithm's predicted values as compared to the actual values, as well as a function which identified an optimal k value for a specific dataset that would return the most accurate predicted values.
  7. With my newfound understanding of the KNN algorithm, I applied the KNN algorithm from the scikit-learn to the iris database. This helped in verifying the usability and accuracy of my algorithm. This also set me up well in learning how to fit and train future models with scikit-learn's library.

👍 Victories and 👎 Challenges

  1. 👍 Created a KNN algorithm that can classify a datapoint in a three-class set consisting of four features and one target value. Code linked here.
  2. 👍 Created simple data visualizations using matplotlib that depict the instance before and after a data point is classified. Images linked here.
  3. 👍 Trained and implemetned scikit-learn's KNN algorithm. Code linked here
  4. 👎 Haven't yet figured out how to create a generalized KNN algorithm that can not only take a single unclassified data point, but also an array of unclassified data points.
  5. 👎 Still learning the scikit-learn library's syntax and the understanding what exaclty certain functions are doing.

🔭 Conclusion

To re-iterate, the goal of this project was not to write the cleanest code, nor was it to write the most efficient KNN algorithm. The goal was to pull back the layers of this unfamiliar concept so that I can gain the confidence and intuition needed to apply KNN models to new and unfamiliar scenarios. So, in that regard, I think that this project was pretty successful, and, not to mention, also extremely rewarding! Moving forward, I'd like to continue ceating ML models from scratch, inclduing decision trees, random forests, genetic algorithms, and neural networks.

About

Creating and implementing a k-nearest neighbours algorithm from scratch in Python, with a goal to understand, not optimise. No machine machine learning libraries were used in the process of creating the algorithm.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published