From ac32100a38243bedda3c4f278ae7850609e88b82 Mon Sep 17 00:00:00 2001 From: Samaneh Saadat Date: Thu, 2 May 2024 21:08:52 +0000 Subject: [PATCH] Add Upload Guide (#1847) * Upload guide. * KerasNLP upload guide. * Address reviews. * Add classifier example. * Kaggle Hub --> Kaggle Models. * Add model loading. * Replace the toy dataset with IMDB dataset. * Adress reviews. * Some final fixes to make autogen run successful. * Fix classifier name in HF upload. * Reduce batch size. * Convert the code for loading to markdown code block. * Get kaggle username from kagglehub.whoami(). * Run black. * Add notebook and markdown. * Add the guide path. * Address reivews. * Update notebook and markdown files. * Remove upload progress bars from the markdown file. * Remove fine tuning progress bars from the markdown file. --- guides/ipynb/keras_nlp/upload.ipynb | 521 ++++++++++++++++++++++++++++ guides/keras_nlp/upload.py | 245 +++++++++++++ guides/md/keras_nlp/upload.md | 308 ++++++++++++++++ scripts/guides_master.py | 4 + 4 files changed, 1078 insertions(+) create mode 100644 guides/ipynb/keras_nlp/upload.ipynb create mode 100644 guides/keras_nlp/upload.py create mode 100644 guides/md/keras_nlp/upload.md diff --git a/guides/ipynb/keras_nlp/upload.ipynb b/guides/ipynb/keras_nlp/upload.ipynb new file mode 100644 index 0000000000..31869a4589 --- /dev/null +++ b/guides/ipynb/keras_nlp/upload.ipynb @@ -0,0 +1,521 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "# Uploading Models with KerasNLP\n", + "\n", + "**Author:** [Samaneh Saadat](https://github.com/SamanehSaadat/), [Matthew Watson](https://github.com/mattdangerw/)
\n", + "**Date created:** 2024/04/29
\n", + "**Last modified:** 2024/04/29
\n", + "**Description:** An introduction on how to upload a fine-tuned KerasNLP model to model hubs." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "# Introduction\n", + "\n", + "Fine-tuning a machine learning model can yield impressive results for specific tasks.\n", + "Uploading your fine-tuned model to a model hub allows you to share it with the broader community.\n", + "By sharing your models, you'll enhance accessibility for other researchers and developers,\n", + "making your contributions an integral part of the machine learning landscape.\n", + "This can also streamline the integration of your model into real-world applications.\n", + "\n", + "This guide walks you through how to upload your fine-tuned models to popular model hubs such as\n", + "[Kaggle Models](https://www.kaggle.com/models) and [Hugging Face Hub](https://huggingface.co/models)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "# Setup\n", + "\n", + "Let's start by installing and importing all the libraries we need. We use KerasNLP for this guide." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "!pip install -q --upgrade keras-nlp huggingface-hub" + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "os.environ[\"KERAS_BACKEND\"] = \"jax\"\n", + "\n", + "import keras_nlp\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "# Data\n", + "\n", + "We can use the IMDB reviews dataset for this guide. Let's load the dataset from `tensorflow_dataset`." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "import tensorflow_datasets as tfds\n", + "\n", + "imdb_train, imdb_test = tfds.load(\n", + " \"imdb_reviews\",\n", + " split=[\"train\", \"test\"],\n", + " as_supervised=True,\n", + " batch_size=4,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "We only use a small subset of the training samples to make the guide run faster.\n", + "However, if you need a higher quality model, consider using a larger number of training samples." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "imdb_train = imdb_train.take(100)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "# Task Upload\n", + "\n", + "A `keras_nlp.models.Task`, wraps a `keras_nlp.models.Backbone` and a `keras_nlp.models.Preprocessor` to create\n", + "a model that can be directly used for training, fine-tuning, and prediction for a given text problem.\n", + "In this section, we explain how to create a `Task`, fine-tune and upload it to a model hub." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "## Load Model\n", + "\n", + "If you want to build a Causal LM based on a base model, simply call `keras_nlp.models.CausalLM.from_preset`\n", + "and pass a built-in preset identifier." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "causal_lm = keras_nlp.models.CausalLM.from_preset(\"gpt2_base_en\")\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "## Fine-tune Model\n", + "\n", + "After loading the model, you can call `.fit()` on the model to fine-tune it.\n", + "Here, we fine-tune the model on the IMDB reviews which makes the model movie domain-specific." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "# Drop labels and keep the review text only for the Causal LM.\n", + "imdb_train_reviews = imdb_train.map(lambda x, y: x)\n", + "\n", + "# Fine-tune the Causal LM.\n", + "causal_lm.fit(imdb_train_reviews)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "## Save the Model Locally\n", + "\n", + "To upload a model, you need to first save the model locally using `save_to_preset`." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "preset_dir = \"./gpt2_imdb\"\n", + "causal_lm.save_to_preset(preset_dir)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "Let's see the saved files." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "os.listdir(preset_dir)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "### Load a Locally Saved Model\n", + "\n", + "A model that is saved to a local preset can be loaded using `from_preset`.\n", + "What you save in, is what you get back out." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "causal_lm = keras_nlp.models.CausalLM.from_preset(preset_dir)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "You can also load the `keras_nlp.models.Backbone` and `keras_nlp.models.Tokenizer` objects from this preset directory.\n", + "Note that these objects are equivalent to `causal_lm.backbone` and `causal_lm.preprocessor.tokenizer` above." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "backbone = keras_nlp.models.Backbone.from_preset(preset_dir)\n", + "tokenizer = keras_nlp.models.Tokenizer.from_preset(preset_dir)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "## Upload the Model to a Model Hub\n", + "\n", + "After saving a preset to a directory, this directory can be uploaded to a model hub such as Kaggle or Hugging Face directly from the KerasNLP library.\n", + "To upload the model to Kaggle, the URI must start with `kaggle://` and to upload to Hugging Face, it should start with `hf://`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "### Upload to Kaggle" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "To upload a model to Kaggle, first, we need to authenticate with Kaggle.\n", + "This can in one of the following ways:\n", + "1. Set environment variables `KAGGLE_USERNAME` and `KAGGLE_KEY`.\n", + "2. Provide a local `~/.kaggle/kaggle.json`.\n", + "3. Call `kagglehub.login()`.\n", + "\n", + "Let's make sure we are logged in before continuing." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "import kagglehub\n", + "\n", + "if \"KAGGLE_USERNAME\" not in os.environ or \"KAGGLE_KEY\" not in os.environ:\n", + " kagglehub.login()\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "To upload a model we can use `keras_nlp.upload_preset(uri, preset_dir)` API where `uri` has the format of\n", + "`kaggle:////Keras/` for uploading to Kaggle and `preset_dir` is the directory that the model is saved in.\n", + "\n", + "Running the following uploads the model that is saved in `preset_dir` to Kaggle:" + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "kaggle_username = kagglehub.whoami()[\"username\"]\n", + "kaggle_uri = f\"kaggle://{kaggle_username}/gpt2/keras/gpt2_imdb\"\n", + "keras_nlp.upload_preset(kaggle_uri, preset_dir)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "### Upload to Hugging Face" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "To upload a model to Hugging Face, first, we need to authenticate with Hugging Face.\n", + "This can in one of the following ways:\n", + "1. Set environment variables `HF_USERNAME` and `HF_TOKEN`.\n", + "2. Call `huggingface_hub.notebook_login()`.\n", + "\n", + "Let's make sure we are logged in before coninuing." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "import huggingface_hub\n", + "\n", + "if \"HF_USERNAME\" not in os.environ or \"HF_TOKEN\" not in os.environ:\n", + " huggingface_hub.notebook_login()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "`keras_nlp.upload_preset(uri, preset_dir)` can be used to upload a model to Hugging Face if `uri` has the format of\n", + "`kaggle:///`.\n", + "\n", + "Running the following uploads the model that is saved in `preset_dir` to Hugging Face:" + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "hf_username = huggingface_hub.whoami()[\"name\"]\n", + "hf_uri = f\"hf://{hf_username}/gpt2_imdb\"\n", + "keras_nlp.upload_preset(hf_uri, preset_dir)\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "## Load a User Uploaded Model\n", + "\n", + "After verifying that the model is uploaded to Kaggle, we can load the model by calling `from_preset`.\n", + "\n", + "```python\n", + "causal_lm = keras_nlp.models.CausalLM.from_preset(\n", + " f\"kaggle://{kaggle_username}/gpt2/keras/gpt2_imdb\"\n", + ")\n", + "```\n", + "\n", + "We can also load the model uploaded to Hugging Face by calling `from_preset`.\n", + "\n", + "```python\n", + "causal_lm = keras_nlp.models.CausalLM.from_preset(f\"hf://{hf_username}/gpt2_imdb\")\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "# Classifier Upload\n", + "\n", + "Uploading a classifier model is similar to Causal LM upload.\n", + "To upload the fine-tuned model, first, the model should be saved to a local directory using `save_to_preset`\n", + "API and then it can be uploaded via `keras_nlp.upload_preset`." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "# Load the base model.\n", + "classifier = keras_nlp.models.Classifier.from_preset(\n", + " \"bert_tiny_en_uncased\", num_classes=2\n", + ")\n", + "\n", + "# Fine-tune the classifier.\n", + "classifier.fit(imdb_train)\n", + "\n", + "# Save the model to a local preset directory.\n", + "preset_dir = \"./bert_tiny_imdb\"\n", + "classifier.save_to_preset(preset_dir)\n", + "\n", + "# Upload to Kaggle.\n", + "keras_nlp.upload_preset(\n", + " f\"kaggle://{kaggle_username}/bert/keras/bert_tiny_imdb\", preset_dir\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "After verifying that the model is uploaded to Kaggle, we can load the model by calling `from_preset`.\n", + "\n", + "```python\n", + "classifier = keras_nlp.models.Classifier.from_preset(\n", + " f\"kaggle://{kaggle_username}/bert/keras/bert_tiny_imdb\"\n", + ")\n", + "```" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "collapsed_sections": [], + "name": "upload", + "private_outputs": false, + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.0" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/guides/keras_nlp/upload.py b/guides/keras_nlp/upload.py new file mode 100644 index 0000000000..5b9d727d35 --- /dev/null +++ b/guides/keras_nlp/upload.py @@ -0,0 +1,245 @@ +""" +Title: Uploading Models with KerasNLP +Author: [Samaneh Saadat](https://github.com/SamanehSaadat/), [Matthew Watson](https://github.com/mattdangerw/) +Date created: 2024/04/29 +Last modified: 2024/04/29 +Description: An introduction on how to upload a fine-tuned KerasNLP model to model hubs. +Accelerator: GPU +""" + +""" +# Introduction + +Fine-tuning a machine learning model can yield impressive results for specific tasks. +Uploading your fine-tuned model to a model hub allows you to share it with the broader community. +By sharing your models, you'll enhance accessibility for other researchers and developers, +making your contributions an integral part of the machine learning landscape. +This can also streamline the integration of your model into real-world applications. + +This guide walks you through how to upload your fine-tuned models to popular model hubs such as +[Kaggle Models](https://www.kaggle.com/models) and [Hugging Face Hub](https://huggingface.co/models). +""" + +""" +# Setup + +Let's start by installing and importing all the libraries we need. We use KerasNLP for this guide. +""" + +"""shell +pip install -q --upgrade keras-nlp huggingface-hub +""" + +import os + +os.environ["KERAS_BACKEND"] = "jax" + +import keras_nlp + + +""" +# Data + +We can use the IMDB reviews dataset for this guide. Let's load the dataset from `tensorflow_dataset`. +""" + +import tensorflow_datasets as tfds + +imdb_train, imdb_test = tfds.load( + "imdb_reviews", + split=["train", "test"], + as_supervised=True, + batch_size=4, +) + +""" +We only use a small subset of the training samples to make the guide run faster. +However, if you need a higher quality model, consider using a larger number of training samples. +""" + +imdb_train = imdb_train.take(100) + +""" +# Task Upload + +A `keras_nlp.models.Task`, wraps a `keras_nlp.models.Backbone` and a `keras_nlp.models.Preprocessor` to create +a model that can be directly used for training, fine-tuning, and prediction for a given text problem. +In this section, we explain how to create a `Task`, fine-tune and upload it to a model hub. +""" + +""" +## Load Model + +If you want to build a Causal LM based on a base model, simply call `keras_nlp.models.CausalLM.from_preset` +and pass a built-in preset identifier. +""" + +causal_lm = keras_nlp.models.CausalLM.from_preset("gpt2_base_en") + + +""" +## Fine-tune Model + +After loading the model, you can call `.fit()` on the model to fine-tune it. +Here, we fine-tune the model on the IMDB reviews which makes the model movie domain-specific. +""" + +# Drop labels and keep the review text only for the Causal LM. +imdb_train_reviews = imdb_train.map(lambda x, y: x) + +# Fine-tune the Causal LM. +causal_lm.fit(imdb_train_reviews) + +""" +## Save the Model Locally + +To upload a model, you need to first save the model locally using `save_to_preset`. +""" + +preset_dir = "./gpt2_imdb" +causal_lm.save_to_preset(preset_dir) + +""" +Let's see the saved files. +""" + +os.listdir(preset_dir) + +""" +### Load a Locally Saved Model + +A model that is saved to a local preset can be loaded using `from_preset`. +What you save in, is what you get back out. +""" + +causal_lm = keras_nlp.models.CausalLM.from_preset(preset_dir) + +""" +You can also load the `keras_nlp.models.Backbone` and `keras_nlp.models.Tokenizer` objects from this preset directory. +Note that these objects are equivalent to `causal_lm.backbone` and `causal_lm.preprocessor.tokenizer` above. +""" + +backbone = keras_nlp.models.Backbone.from_preset(preset_dir) +tokenizer = keras_nlp.models.Tokenizer.from_preset(preset_dir) + +""" +## Upload the Model to a Model Hub + +After saving a preset to a directory, this directory can be uploaded to a model hub such as Kaggle or Hugging Face directly from the KerasNLP library. +To upload the model to Kaggle, the URI must start with `kaggle://` and to upload to Hugging Face, it should start with `hf://`. +""" +""" +### Upload to Kaggle +""" + +""" +To upload a model to Kaggle, first, we need to authenticate with Kaggle. +This can in one of the following ways: +1. Set environment variables `KAGGLE_USERNAME` and `KAGGLE_KEY`. +2. Provide a local `~/.kaggle/kaggle.json`. +3. Call `kagglehub.login()`. + +Let's make sure we are logged in before continuing. +""" + +import kagglehub + +if "KAGGLE_USERNAME" not in os.environ or "KAGGLE_KEY" not in os.environ: + kagglehub.login() + + +""" + +To upload a model we can use `keras_nlp.upload_preset(uri, preset_dir)` API where `uri` has the format of +`kaggle:////Keras/` for uploading to Kaggle and `preset_dir` is the directory that the model is saved in. + +Running the following uploads the model that is saved in `preset_dir` to Kaggle: +""" +kaggle_username = kagglehub.whoami()["username"] +kaggle_uri = f"kaggle://{kaggle_username}/gpt2/keras/gpt2_imdb" +keras_nlp.upload_preset(kaggle_uri, preset_dir) + +""" +### Upload to Hugging Face +""" + +""" +To upload a model to Hugging Face, first, we need to authenticate with Hugging Face. +This can in one of the following ways: +1. Set environment variables `HF_USERNAME` and `HF_TOKEN`. +2. Call `huggingface_hub.notebook_login()`. + +Let's make sure we are logged in before coninuing. +""" + +import huggingface_hub + +if "HF_USERNAME" not in os.environ or "HF_TOKEN" not in os.environ: + huggingface_hub.notebook_login() + +""" + +`keras_nlp.upload_preset(uri, preset_dir)` can be used to upload a model to Hugging Face if `uri` has the format of +`kaggle:///`. + +Running the following uploads the model that is saved in `preset_dir` to Hugging Face: +""" + +hf_username = huggingface_hub.whoami()["name"] +hf_uri = f"hf://{hf_username}/gpt2_imdb" +keras_nlp.upload_preset(hf_uri, preset_dir) + + +""" +## Load a User Uploaded Model + +After verifying that the model is uploaded to Kaggle, we can load the model by calling `from_preset`. + +```python +causal_lm = keras_nlp.models.CausalLM.from_preset( + f"kaggle://{kaggle_username}/gpt2/keras/gpt2_imdb" +) +``` + +We can also load the model uploaded to Hugging Face by calling `from_preset`. + +```python +causal_lm = keras_nlp.models.CausalLM.from_preset(f"hf://{hf_username}/gpt2_imdb") +``` +""" + + +""" +# Classifier Upload + +Uploading a classifier model is similar to Causal LM upload. +To upload the fine-tuned model, first, the model should be saved to a local directory using `save_to_preset` +API and then it can be uploaded via `keras_nlp.upload_preset`. +""" + +# Load the base model. +classifier = keras_nlp.models.Classifier.from_preset( + "bert_tiny_en_uncased", num_classes=2 +) + +# Fine-tune the classifier. +classifier.fit(imdb_train) + +# Save the model to a local preset directory. +preset_dir = "./bert_tiny_imdb" +classifier.save_to_preset(preset_dir) + +# Upload to Kaggle. +keras_nlp.upload_preset( + f"kaggle://{kaggle_username}/bert/keras/bert_tiny_imdb", preset_dir +) + +""" +After verifying that the model is uploaded to Kaggle, we can load the model by calling `from_preset`. + +```python +classifier = keras_nlp.models.Classifier.from_preset( + f"kaggle://{kaggle_username}/bert/keras/bert_tiny_imdb" +) +``` +""" diff --git a/guides/md/keras_nlp/upload.md b/guides/md/keras_nlp/upload.md new file mode 100644 index 0000000000..cb6442000c --- /dev/null +++ b/guides/md/keras_nlp/upload.md @@ -0,0 +1,308 @@ +# Uploading Models with KerasNLP + +**Author:** [Samaneh Saadat](https://github.com/SamanehSaadat/), [Matthew Watson](https://github.com/mattdangerw/)
+**Date created:** 2024/04/29
+**Last modified:** 2024/04/29
+**Description:** An introduction on how to upload a fine-tuned KerasNLP model to model hubs. + + + [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/guides/ipynb/keras_nlp/upload.ipynb) [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/guides/keras_nlp/upload.py) + + + +# Introduction + +Fine-tuning a machine learning model can yield impressive results for specific tasks. +Uploading your fine-tuned model to a model hub allows you to share it with the broader community. +By sharing your models, you'll enhance accessibility for other researchers and developers, +making your contributions an integral part of the machine learning landscape. +This can also streamline the integration of your model into real-world applications. + +This guide walks you through how to upload your fine-tuned models to popular model hubs such as +[Kaggle Models](https://www.kaggle.com/models) and [Hugging Face Hub](https://huggingface.co/models). + +# Setup + +Let's start by installing and importing all the libraries we need. We use KerasNLP for this guide. + + +```python +!pip install -q --upgrade keras-nlp huggingface-hub +``` + + +```python +import os + +os.environ["KERAS_BACKEND"] = "jax" + +import keras_nlp + +``` + +# Data + +We can use the IMDB reviews dataset for this guide. Let's load the dataset from `tensorflow_dataset`. + + +```python +import tensorflow_datasets as tfds + +imdb_train, imdb_test = tfds.load( + "imdb_reviews", + split=["train", "test"], + as_supervised=True, + batch_size=4, +) +``` + +We only use a small subset of the training samples to make the guide run faster. +However, if you need a higher quality model, consider using a larger number of training samples. + + +```python +imdb_train = imdb_train.take(100) +``` + +# Task Upload + +A `keras_nlp.models.Task`, wraps a `keras_nlp.models.Backbone` and a `keras_nlp.models.Preprocessor` to create +a model that can be directly used for training, fine-tuning, and prediction for a given text problem. +In this section, we explain how to create a `Task`, fine-tune and upload it to a model hub. + +--- +## Load Model + +If you want to build a Causal LM based on a base model, simply call `keras_nlp.models.CausalLM.from_preset` +and pass a built-in preset identifier. + + +```python +causal_lm = keras_nlp.models.CausalLM.from_preset("gpt2_base_en") + +``` + +
+``` +Downloading from https://www.kaggle.com/api/v1/models/keras/gpt2/keras/gpt2_base_en/2/download/task.json... + +Downloading from https://www.kaggle.com/api/v1/models/keras/gpt2/keras/gpt2_base_en/2/download/preprocessor.json... + +``` +
+--- +## Fine-tune Model + +After loading the model, you can call `.fit()` on the model to fine-tune it. +Here, we fine-tune the model on the IMDB reviews which makes the model movie domain-specific. + + +```python +# Drop labels and keep the review text only for the Causal LM. +imdb_train_reviews = imdb_train.map(lambda x, y: x) + +# Fine-tune the Causal LM. +causal_lm.fit(imdb_train_reviews) +``` + 100/100 ━━━━━━━━━━━━━━━━━━━━ 151s 1s/step - loss: 1.0198 - sparse_categorical_accuracy: 0.3271 + +--- +## Save the Model Locally + +To upload a model, you need to first save the model locally using `save_to_preset`. + + +```python +preset_dir = "./gpt2_imdb" +causal_lm.save_to_preset(preset_dir) +``` + +Let's see the saved files. + + +```python +os.listdir(preset_dir) +``` + + + + +
+``` +['preprocessor.json', + 'tokenizer.json', + 'task.json', + 'model.weights.h5', + 'config.json', + 'metadata.json', + 'assets'] + +``` +
+### Load a Locally Saved Model + +A model that is saved to a local preset can be loaded using `from_preset`. +What you save in, is what you get back out. + + +```python +causal_lm = keras_nlp.models.CausalLM.from_preset(preset_dir) +``` + +You can also load the `keras_nlp.models.Backbone` and `keras_nlp.models.Tokenizer` objects from this preset directory. +Note that these objects are equivalent to `causal_lm.backbone` and `causal_lm.preprocessor.tokenizer` above. + + +```python +backbone = keras_nlp.models.Backbone.from_preset(preset_dir) +tokenizer = keras_nlp.models.Tokenizer.from_preset(preset_dir) +``` + +--- +## Upload the Model to a Model Hub + +After saving a preset to a directory, this directory can be uploaded to a model hub such as Kaggle or Hugging Face directly from the KerasNLP library. +To upload the model to Kaggle, the URI must start with `kaggle://` and to upload to Hugging Face, it should start with `hf://`. + +### Upload to Kaggle + +To upload a model to Kaggle, first, we need to authenticate with Kaggle. +This can in one of the following ways: +1. Set environment variables `KAGGLE_USERNAME` and `KAGGLE_KEY`. +2. Provide a local `~/.kaggle/kaggle.json`. +3. Call `kagglehub.login()`. + +Let's make sure we are logged in before continuing. + + +```python +import kagglehub + +if "KAGGLE_USERNAME" not in os.environ or "KAGGLE_KEY" not in os.environ: + kagglehub.login() + +``` + +To upload a model we can use `keras_nlp.upload_preset(uri, preset_dir)` API where `uri` has the format of +`kaggle:////Keras/` for uploading to Kaggle and `preset_dir` is the directory that the model is saved in. + +Running the following uploads the model that is saved in `preset_dir` to Kaggle: + + +```python +kaggle_username = kagglehub.whoami()["username"] +kaggle_uri = f"kaggle://{kaggle_username}/gpt2/keras/gpt2_imdb" +keras_nlp.upload_preset(kaggle_uri, preset_dir) +``` + +
+``` +Upload successful: preprocessor.json (834B) +Upload successful: tokenizer.json (322B) +Upload successful: task.json (2KB) +Upload successful: model.weights.h5 (475MB) +Upload successful: config.json (431B) +Upload successful: metadata.json (142B) +Upload successful: merges.txt (446KB) +Upload successful: vocabulary.json (1018KB) + +Your model instance version has been created. + +``` +
+### Upload to Hugging Face + +To upload a model to Hugging Face, first, we need to authenticate with Hugging Face. +This can in one of the following ways: +1. Set environment variables `HF_USERNAME` and `HF_TOKEN`. +2. Call `huggingface_hub.notebook_login()`. + +Let's make sure we are logged in before coninuing. + + +```python +import huggingface_hub + +if "HF_USERNAME" not in os.environ or "HF_TOKEN" not in os.environ: + huggingface_hub.notebook_login() +``` + +`keras_nlp.upload_preset(uri, preset_dir)` can be used to upload a model to Hugging Face if `uri` has the format of +`kaggle:///`. + +Running the following uploads the model that is saved in `preset_dir` to Hugging Face: + + +```python +hf_username = huggingface_hub.whoami()["name"] +hf_uri = f"hf://{hf_username}/gpt2_imdb" +keras_nlp.upload_preset(hf_uri, preset_dir) + +``` + +--- +## Load a User Uploaded Model + +After verifying that the model is uploaded to Kaggle, we can load the model by calling `from_preset`. + +```python +causal_lm = keras_nlp.models.CausalLM.from_preset( + f"kaggle://{kaggle_username}/gpt2/keras/gpt2_imdb" +) +``` + +We can also load the model uploaded to Hugging Face by calling `from_preset`. + +```python +causal_lm = keras_nlp.models.CausalLM.from_preset(f"hf://{hf_username}/gpt2_imdb") +``` + +# Classifier Upload + +Uploading a classifier model is similar to Causal LM upload. +To upload the fine-tuned model, first, the model should be saved to a local directory using `save_to_preset` +API and then it can be uploaded via `keras_nlp.upload_preset`. + + +```python +# Load the base model. +classifier = keras_nlp.models.Classifier.from_preset( + "bert_tiny_en_uncased", num_classes=2 +) + +# Fine-tune the classifier. +classifier.fit(imdb_train) + +# Save the model to a local preset directory. +preset_dir = "./bert_tiny_imdb" +classifier.save_to_preset(preset_dir) + +# Upload to Kaggle. +keras_nlp.upload_preset( + f"kaggle://{kaggle_username}/bert/keras/bert_tiny_imdb", preset_dir +) +``` + 100/100 ━━━━━━━━━━━━━━━━━━━━ 7s 31ms/step - loss: 0.6975 - sparse_categorical_accuracy: 0.5164 + + +
+``` +Upload successful: preprocessor.json (947B) +Upload successful: tokenizer.json (461B) +Upload successful: task.json (2KB) +Upload successful: task.weights.h5 (50MB) +Upload successful: model.weights.h5 (17MB) +Upload successful: config.json (454B) +Upload successful: metadata.json (140B) +Upload successful: vocabulary.txt (226KB) + +Your model instance version has been created. +``` +
+After verifying that the model is uploaded to Kaggle, we can load the model by calling `from_preset`. + +```python +classifier = keras_nlp.models.Classifier.from_preset( + f"kaggle://{kaggle_username}/bert/keras/bert_tiny_imdb" +) +``` diff --git a/scripts/guides_master.py b/scripts/guides_master.py index bcf697cb99..7505d000d4 100644 --- a/scripts/guides_master.py +++ b/scripts/guides_master.py @@ -47,6 +47,10 @@ "path": "transformer_pretraining", "title": "Pretraining a Transformer from scratch with KerasNLP", }, + { + "path": "upload", + "title": "Uploading Models with KerasNLP", + }, ], }