crowd-sourced

An awesome open source 'ML As A Service' project to get high quality labels for your models!
Explore the docs »

View Demo · Report Bug · Request Feature · Vercel View · DevPost Submission

About the project

We chose to work on a crowd-sourced tool for data labeling, to simplify the data pipeline for deep learning models.

Nowadays we have amazing tools at our disposal, like systems for language translation, or movies recommendation, computer vision and autonomous driving, and much more! But all these systems are very data-hungry, and most of the time of a Deep Learning engineer can be spent on data collection and preparation. With the right tools, like ours, this time can be reduced.

But not only: lots of data is not enough, we need to ensure that the quality of the data is high. From this perspective, leveraging an open-source crowd-sourced platform is a unique opportunity to achieve a gold standard in data quality and to work towards building an automated system that is more fair, inclusive, and resilient to bias.

Thus, Crowd Sourced is a free platform that lets users from all around the world contribute to labelling textual and graphical data for the intention of data for machine learning and as a open source way of collecting human labelled, gold standard data for machine learning models.

This project comes with a basic UI support, an easy to use interface, set of ready to use interactions and most importantly, all of the content provided with ease for the user to label, so that you can have whatever data for labelling you like in the database without worrying about how the data will be served and labelled.

Built With

This section lists down the technologies which were used in the making of this wesome project! They are,

Makefile ❤️ scripts for automating many of the processes
Black (formatting) ❤️ Flake8 (linting)
FlaskAPI ❤️ MongoEngine (ORM) ❤️ VirtualEnv
Reacts ❤️ Material UI ❤️ yarn
EsLint with React to make sure no bugs arose
GitHub ❤️ with the issue and a pull request template
MongoDB as the database used
Docker/Docker Compose
Linux ❤️ wget ❤️ zip for automating dataset generation and setup
Python ❤️ requests lib, for using API from Unsplash

Features

The web application is built by keeping the following aspects in mind,

🎁 Modern – Project created using the latest features of React (State management using Hooks)
💻 Responsive – Highly responsive and reusable UI components, that change depending on the provided props, since the UI library used here will be Material UI, which provides responsive components out of the box already
🚀 Fast – Buttery smooth experience thanks to the light weight implementation of best practices in ReactJS
⚙️ Maintenance - The project is built with Docker Compose, following the easiness of adding and removing services, with easy to add code for maintainability purposes

Getting Started

This section deals with how the instructions for setting this project up your machine locally

Prerequisites

To start and build the project, you need to have,

Docker on your machine. Here is how
yarn - you can get the that here
Linux - this project was built with developers using Linux, since the main build is done with Makefile

Installation

After you've setup docker, simply follow the steps, or just run the code snippet at the end of this seciton,

Clone the project, git clone https://github.com/baioc/crowd-sourced
Then go into the project, cd crowd-sourced
Go into the client folder with, cd client/, and run yarn
Simply run, make build-dev. This simply builds the dev version of the project for now. It will first begin to download some basic Docker files, but at the end ( it might take 2-3 minutes in the 1st attempt ), in the end, it will give you an output like this,

Once down, simply visit http://localhost:3000/, and http://localhost:8080/, you should see pages similar to the pages below,

Or simply, run the following commands!

git clone https://github.com/baioc/crowd-sourced
cd crowd-sourced
cd client
yarn
cd ..
make buil-dev

Code Management

To also format and lint the Pythonic code for the FlaskAPI, run the following commands while in the root,

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
make precommit

This will perform formatting with black, and linting with flake8.

Datasets setup

Download the datasets,

make datasets-download

Loading to MongoDB,

make datasets-setup

Challenges we ran into

Quite a lot,

Deciding on the details of the workflow
Deciding on the technology, and making it easy for everyone to follow along with all the issues and the work needed to be done
Deciding on the UI, and the whole team keeping on par with the quick learning curve and idea
Trying to convert the idea into the most MVP like as much as possible
Maintaining best practices with branches, Github issues & PRs
Making sure everyone was on the same page
Dealing with hidden bugs with Docker Compose, MongoDB, the server especially
Having to deploy the frontend somewhere
Linting, formatting, to make sure the code quality was high
Simplifying many processes by using a Makefile

Accomplishments that we're proud of

We're proud of a couple of things,

Very rapid development
Rapid learning and understanding of the solution
Quickly adapting to a workflow
Not getting overwhelmed with a sense of feeling we won't make it
Including so many technologies, stacks, and overall ideas to get this MVP out there
Discussing very frequently and keeping in touch with everyone to make sure good progress is made
An issue/PR/branch system of GitHub. As of now, we have 6 closed issues, 2 open issues, and 16 closed PRs, with a total of 70 commits
Overall, having fun!

What we learned

A lot, for all of us,

On technical expertise, JS, Python, Automating, Bash Scripting, Dataset generation, API calls, Makefile
From a people perspective, time zone communication, deciding on a solid single idea and building on top of that

What's next for Crowd Sourced

Expanding a bit more on the idea, letting users upload their own data, letting 3rd party websites use this functionality in their own web applications, a bit like google captcha. We might go on to make a personal profile for each user, with proper authentication and everything, for each user to upload his/her own personal datasets to be labeled, and then easily introducing a more expansive platform with deals with audio and other formats of data sets well.

Final Screenshots


Home Landing Page	Options Page
Select Images Page	Options Page
Docker Compose Running	Automated Data Generation Thanks To Makefile
Dataset added from ./server/shared/img/

Video Demo

The video demo can be found below,

Special thanks

Major League Hacking - for the opportunity to be part of this hackathon

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the GNU General Public License v3.0 License. See LICENSE for more information.

Usage

Use it for whatever machine learning ideas you have! Happy hacking!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

crowd-sourced

crowd-sourced

Table of contents

About the project

Built With

Features

Getting Started

Prerequisites

Installation

Code Management

Datasets setup

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for Crowd Sourced

Final Screenshots

Video Demo

Special thanks

Contributing

License

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

crowd-sourced

crowd-sourced

Table of contents

About the project

Built With

Features

Getting Started

Prerequisites

Installation

Code Management

Datasets setup

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for Crowd Sourced

Final Screenshots

Video Demo

Special thanks

Contributing

License

Usage