Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: DOC-205: Document changes to ML backend #5642

Merged
merged 24 commits into from
Apr 18, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
a9e31bd
docs: DOC-205: Document changes to ML backend
Mar 28, 2024
c020d91
Added links to gen ai template intros
Mar 28, 2024
ae082ed
Updating settings pages
Mar 28, 2024
c150f06
Adjusting heading levels
Mar 28, 2024
de33f63
Removed template pages
Apr 3, 2024
4e83e27
ML page updates for clarity and readability
Apr 3, 2024
7dd00b8
Updated the page on writing a custom ML backend
Apr 3, 2024
8872cc0
Updates to active learning loop page
Apr 3, 2024
9a2f00b
Added link to project settings page
Apr 3, 2024
a3bc0f9
Another link to settings page
Apr 3, 2024
f75fe49
Updated link
caitlinwheeless Apr 4, 2024
f52710c
Update ml.md to fix mmdetection link
caitlinwheeless Apr 4, 2024
c9d84f6
Removed the model table and replaced with a link
Apr 8, 2024
9da50f9
Update docs/source/guide/active_learning.md
caitlinwheeless Apr 11, 2024
f73c6e6
Apply suggestions from code review
caitlinwheeless Apr 11, 2024
5f91910
Apply suggestions from code review
caitlinwheeless Apr 11, 2024
d639433
Apply suggestions from code review
caitlinwheeless Apr 11, 2024
fc6b0eb
Apply suggestions from code review
caitlinwheeless Apr 11, 2024
351c2d6
Incorporating comments from review
Apr 11, 2024
8225e07
Update ml.md to fix typo
caitlinwheeless Apr 11, 2024
a8d4a58
Update ml.md to add models table
caitlinwheeless Apr 16, 2024
02653fa
Apply suggestions from code review
caitlinwheeless Apr 16, 2024
0cccf96
Updating the UI strings for settings and adding tag links under smart…
Apr 16, 2024
7a7d892
Update ml.md
caitlinwheeless Apr 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 8 additions & 17 deletions docs/source/guide/active_learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,25 +13,22 @@ section: "Machine learning"

Follow this tutorial to set up an active learning loop with Label Studio.

<div class="enterprise-only">
Use Label Studio Enterprise Edition to build an automated active learning loop with a machine learning model backend. If you use the open source Community Edition of Label Studio, you can manually sort tasks and retrieve predictions to mimic an active learning process. If you're using Label Studio Community Edition, see how to [manually manage your active learning loop](#Set-up-manual-active-learning).

<p>
Use Label Studio Enterprise Edition to build an automated active learning loop with a machine learning model backend. If you use the open source Community Edition of Label Studio, you can manually sort tasks and retrieve predictions to mimic an active learning process. If you're using Label Studio Community Edition, see <a href="#Customize-your-active-learning-loop">how to customize your active learning loop</a>.
</p>

</div>

## About Active Learning

To create annotated training data for supervised machine learning models can be expensive and time-consuming. Active Learning is a branch of machine learning that seeks to **minimize the total amount of data required for labeling by strategically sampling observations** that provide new insight into the problem. In particular, Active Learning algorithms aim to select diverse and informative data for annotation, rather than random observations, from a pool of unlabeled data using **prediction scores**. For more about the practice of active learning, read [this article written by Heartex CTO on Towards Data Science](https://towardsdatascience.com/learn-faster-with-smarter-data-labeling-15d0272614c4).
Creating annotated training data for supervised machine learning models can be expensive and time-consuming. Active Learning is a branch of machine learning that seeks to **minimize the total amount of data required for labeling by strategically sampling observations** that provide new insight into the problem.

In particular, Active Learning algorithms aim to select diverse and informative data for annotation, rather than random observations, from a pool of unlabeled data using **prediction scores**. For more about the practice of active learning, read [this article written by our HumanSignal CTO on Towards Data Science](https://towardsdatascience.com/learn-faster-with-smarter-data-labeling-15d0272614c4).
caitlinwheeless marked this conversation as resolved.
Show resolved Hide resolved

## Set up an automated active learning loop

Continuously train and review predictions from a connected machine learning model using Label Studio.

<br/><img src="/images/LS-active-learning.jpg" alt="Diagram of the active learning workflow described in surrounding text" class="gif-border" width="800px" height="472px" />

After a user creates an annotation in Label Studio, the configured webhook sends a message to the machine learning backend with the information about the created annotation. The fit() method of the ML backend runs to train the model. When the user moves on to the next labeling task, Label Studio retrieves the latest prediction for the task from the ML backend, which runs the predict() method on the task.
After a user creates an annotation in Label Studio, the configured webhook sends a message to the machine learning backend with the information about the created annotation. The `fit()` method of the ML backend runs to train the model. When the user moves on to the next labeling task, Label Studio retrieves the latest prediction for the task from the ML backend, which runs the predict() method on the task.
caitlinwheeless marked this conversation as resolved.
Show resolved Hide resolved

To set up this active learning, do the following:
1. [Set up an ML model as an ML backend for active learning](#Set-up-an-ML-model-as-an-ML-backend-for-active-learning).
Expand All @@ -49,7 +46,7 @@ As you label tasks, Label Studio sends webhook events to your machine learning b
## Connect the ML backend to Label Studio for active learning

1. Follow the steps to [Add an ML backend to Label Studio](ml.html#Add-an-ML-backend-to-Label-Studio).
2. Under **ML-Assisted Labeling**, enable the setting to **Show predictions to annotators in the Label Stream and Quick View**.
2. Under **Model**, enable the setting to **Start model training on annotation submission**.
caitlinwheeless marked this conversation as resolved.
Show resolved Hide resolved

## Configure webhooks to send a training event to the ML backend (optional)

Expand All @@ -67,8 +64,6 @@ If you want, you can set up your project to send a webhook event and use that ev

For more details on the webhook event payloads, see the full [payload details for the annotation webhook](webhook_reference.html#Annotation-Created).

<div class="enterprise-only">

## Set up task sampling with prediction scores

In order to maximize the training efficiency and effectiveness of your machine learning model, you want your annotators to focus on labeling the tasks with the least confident, or most uncertain, prediction scores from your model. To do make sure of that, [set up uncertainty task sampling](setup_project.html#Set-up-task-sampling).
Expand All @@ -79,8 +74,6 @@ On the project data manager, select **Label All Tasks** to start labeling.

As your model retrains and a new version is updated in Label Studio, the tasks shown next to annotators are always those with the lowest prediction scores, reflecting those with the lowest model certainty. The predictions for the tasks correspond to the latest model version.

</div>

## Customize your active learning loop

If you want to change the behavior of the active learning loop, you can make manual changes.
Expand All @@ -90,15 +83,13 @@ If you want to change the behavior of the active learning loop, you can make man
- If you want to delete all predictions after your model is retrained, see how to [delete predictions](ml.html#Delete-predictions).
- If you need to retrieve and save predictions for all tasks, see the recommendations for [retrieving predictions from a model](ml.html#Get-predictions-from-a-model).

<div class="opensource-only">

### Set up manual active learning

If you're using Label Studio community edition, data annotators can't experience a live active learning loop. You can mimic an active learning experience by doing the following:
If you're using Label Studio Community Edition, data annotators can't experience a live active learning loop. You can mimic an active learning experience by doing the following:
1. Manually [retrieve predictions from a model](ml.html#Get-predictions-from-a-model).
2. [Sort the tasks in the data manager by prediction score](manage_data.html#Example-Sort-by-prediction-score).
3. Select **Label Tasks As Displayed** when labeling tasks.

This manual active learning loop does not automatically update the order of tasks presented to annotators as the ML backend trains with each new annotation and produces new predictions. Therefore, instead of on-the-fly automated active learning, you can perform a form of batched active learning, where you perform annotation for a period, stop to train the model, then retrieve new predictions and start annotating tasks again.

</div>

Loading