From ceef34d3f63bad566e94ade2440fb4db4065bdda Mon Sep 17 00:00:00 2001 From: Matt Watson <1389937+mattdangerw@users.noreply.github.com> Date: Tue, 1 Aug 2023 10:05:04 -0700 Subject: [PATCH] Update KerasNLP getting started guide for multi-backend keras (#1456) * Update the getting started guide for multi-backend keras * Address comments --- guides/ipynb/keras_nlp/getting_started.ipynb | 128 ++-- guides/keras_nlp/getting_started.py | 64 +- guides/md/keras_nlp/getting_started.md | 612 ++++++++++--------- 3 files changed, 436 insertions(+), 368 deletions(-) diff --git a/guides/ipynb/keras_nlp/getting_started.ipynb b/guides/ipynb/keras_nlp/getting_started.ipynb index 3ae5fa8b81..e0d73ffc34 100644 --- a/guides/ipynb/keras_nlp/getting_started.ipynb +++ b/guides/ipynb/keras_nlp/getting_started.ipynb @@ -1,7 +1,6 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -16,7 +15,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -27,14 +25,19 @@ "KerasNLP is a natural language processing library that supports users through\n", "their entire development cycle. Our workflows are built from modular components\n", "that have state-of-the-art preset weights and architectures when used\n", - "out-of-the-box and are easily customizable when more control is needed. We\n", - "emphasize in-graph computation for all workflows so that developers can expect\n", - "easy productionization using the TensorFlow ecosystem.\n", + "out-of-the-box and are easily customizable when more control is needed.\n", "\n", "This library is an extension of the core Keras API; all high-level modules are\n", "[`Layers`](/api/layers/) or [`Models`](/api/models/). If you are familiar with Keras,\n", "congratulations! You already understand most of KerasNLP.\n", "\n", + "KerasNLP uses the [Keras Core](https://keras.io/keras_core/) library to work\n", + "with any of TensorFlow, Pytorch and Jax. In the guide below, we will use the\n", + "`jax` backend for training our models, and [tf.data](https://www.tensorflow.org/guide/data)\n", + "for efficiently running our input preprocessing. But feel free to mix things up!\n", + "This guide runs in TensorFlow or PyTorch backends with zero changes, simply update\n", + "the `KERAS_BACKEND` below.\n", + "\n", "This guide demonstrates our modular approach using a sentiment analysis example at six\n", "levels of complexity:\n", "\n", @@ -53,33 +56,32 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 0, "metadata": { "colab_type": "code" }, "outputs": [], "source": [ - "!pip install -q --upgrade keras-nlp tensorflow" + "!pip install -q --upgrade keras-nlp" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 0, "metadata": { "colab_type": "code" }, "outputs": [], "source": [ - "import keras_nlp\n", - "import tensorflow as tf\n", - "from tensorflow import keras\n", + "import os\n", "\n", - "# Use mixed precision for optimal performance\n", - "keras.mixed_precision.set_global_policy(\"mixed_float16\")" + "os.environ[\"KERAS_BACKEND\"] = \"jax\" # or \"tensorflow\" or \"torch\"\n", + "\n", + "import keras_nlp\n", + "import keras_core as keras" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -93,7 +95,7 @@ "modules:\n", "\n", "* **Tokenizer**: `keras_nlp.models.XXTokenizer`\n", - " * **What it does**: Converts strings to `tf.RaggedTensor`s of token ids.\n", + " * **What it does**: Converts strings to sequences of token ids.\n", " * **Why it's important**: The raw bytes of a string are too high dimensional to be useful\n", " features so we first map them to a small number of tokens, for example `\"The quick brown\n", " fox\"` to `[\"the\", \"qu\", \"##ick\", \"br\", \"##own\", \"fox\"]`.\n", @@ -134,7 +136,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -152,7 +153,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 0, "metadata": { "colab_type": "code" }, @@ -166,29 +167,29 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 0, "metadata": { "colab_type": "code" }, "outputs": [], "source": [ "BATCH_SIZE = 16\n", - "imdb_train = tf.keras.utils.text_dataset_from_directory(\n", + "imdb_train = keras.utils.text_dataset_from_directory(\n", " \"aclImdb/train\",\n", " batch_size=BATCH_SIZE,\n", ")\n", - "imdb_test = tf.keras.utils.text_dataset_from_directory(\n", + "imdb_test = keras.utils.text_dataset_from_directory(\n", " \"aclImdb/test\",\n", " batch_size=BATCH_SIZE,\n", ")\n", "\n", "# Inspect first review\n", "# Format is (review text tensor, label tensor)\n", - "print(imdb_train.unbatch().take(1).get_single_element())" + "print(imdb_train.unbatch().take(1).get_single_element())\n", + "" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -208,7 +209,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 0, "metadata": { "colab_type": "code" }, @@ -220,7 +221,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -246,7 +246,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 0, "metadata": { "colab_type": "code" }, @@ -256,7 +256,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -266,7 +265,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -295,7 +293,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 0, "metadata": { "colab_type": "code" }, @@ -313,7 +311,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -324,7 +321,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -343,7 +339,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -358,25 +353,32 @@ "In this workflow we train the model over three epochs using `tf.data.Dataset.cache()`,\n", "which computes the preprocessing once and caches the result before fitting begins.\n", "\n", - "**Note:** this code only works if your data fits in memory. If not, pass a `filename` to\n", - "`cache()`." + "**Note:** we can use `tf.data` for preprocessing while running on the\n", + "Jax or PyTorch backend. The input dataset will automatically be converted to\n", + "backend native tensor types during fit. In fact, given the efficiency of `tf.data`\n", + "for running preprocessing, this is good practice on all backends." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 0, "metadata": { "colab_type": "code" }, "outputs": [], "source": [ + "import tensorflow as tf\n", + "\n", "preprocessor = keras_nlp.models.BertPreprocessor.from_preset(\n", " \"bert_tiny_en_uncased\",\n", " sequence_length=512,\n", ")\n", + "\n", "# Apply the preprocessor to every sample of train and test data using `map()`.\n", "# `tf.data.AUTOTUNE` and `prefetch()` are options to tune performance, see\n", "# https://www.tensorflow.org/guide/data_performance for details.\n", + "\n", + "# Note: only call `cache()` if you training data fits in CPU memory!\n", "imdb_train_cached = (\n", " imdb_train.map(preprocessor, tf.data.AUTOTUNE).cache().prefetch(tf.data.AUTOTUNE)\n", ")\n", @@ -395,7 +397,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -408,7 +409,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -421,12 +421,14 @@ "constructor to get the vocabulary matching pretraining.\n", "\n", "**Note:** `BertTokenizer` does not pad sequences by default, so the output is\n", - "a `tf.RaggedTensor`." + "ragged (each sequence has varying length). The `MultiSegmentPacker` below\n", + "handles padding these ragged sequences to dense tensor types (e.g. `tf.Tensor`\n", + "or `torch.Tensor`)." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 0, "metadata": { "colab_type": "code" }, @@ -470,7 +472,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -496,7 +497,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 0, "metadata": { "colab_type": "code" }, @@ -527,8 +528,8 @@ "model = keras.Model(inputs, outputs)\n", "model.compile(\n", " loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", - " optimizer=keras.optimizers.experimental.AdamW(5e-5),\n", - " metrics=keras.metrics.SparseCategoricalAccuracy(),\n", + " optimizer=keras.optimizers.AdamW(5e-5),\n", + " metrics=[keras.metrics.SparseCategoricalAccuracy()],\n", " jit_compile=True,\n", ")\n", "model.summary()\n", @@ -540,7 +541,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -552,7 +552,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -582,7 +581,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -593,7 +591,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 0, "metadata": { "colab_type": "code" }, @@ -648,7 +646,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -659,7 +656,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 0, "metadata": { "colab_type": "code" }, @@ -680,10 +677,10 @@ ")\n", "\n", "inputs = {\n", - " \"token_ids\": keras.Input(shape=(None,), dtype=tf.int32),\n", - " \"segment_ids\": keras.Input(shape=(None,), dtype=tf.int32),\n", - " \"padding_mask\": keras.Input(shape=(None,), dtype=tf.int32),\n", - " \"mask_positions\": keras.Input(shape=(None,), dtype=tf.int32),\n", + " \"token_ids\": keras.Input(shape=(None,), dtype=tf.int32, name=\"token_ids\"),\n", + " \"segment_ids\": keras.Input(shape=(None,), dtype=tf.int32, name=\"segment_ids\"),\n", + " \"padding_mask\": keras.Input(shape=(None,), dtype=tf.int32, name=\"padding_mask\"),\n", + " \"mask_positions\": keras.Input(shape=(None,), dtype=tf.int32, name=\"mask_positions\"),\n", "}\n", "\n", "# Encoded token sequence\n", @@ -692,15 +689,15 @@ "# Predict an output word for each masked input token.\n", "# We use the input token embedding to project from our encoded vectors to\n", "# vocabulary logits, which has been shown to improve training efficiency.\n", - "outputs = mlm_head(sequence, mask_positions=inputs[\"mask_positions\"])\n", + "outputs = mlm_head(sequence, masked_positions=inputs[\"mask_positions\"])\n", "\n", "# Define and compile our pretraining model.\n", "pretraining_model = keras.Model(inputs, outputs)\n", "pretraining_model.summary()\n", "pretraining_model.compile(\n", " loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", - " optimizer=keras.optimizers.experimental.AdamW(learning_rate=5e-4),\n", - " weighted_metrics=keras.metrics.SparseCategoricalAccuracy(),\n", + " optimizer=keras.optimizers.AdamW(learning_rate=5e-4),\n", + " weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],\n", " jit_compile=True,\n", ")\n", "\n", @@ -713,7 +710,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -723,7 +719,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -745,7 +740,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -756,7 +750,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 0, "metadata": { "colab_type": "code" }, @@ -778,7 +772,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -789,7 +782,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 0, "metadata": { "colab_type": "code" }, @@ -819,7 +812,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -830,7 +822,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 0, "metadata": { "colab_type": "code" }, @@ -862,7 +854,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -873,7 +864,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 0, "metadata": { "colab_type": "code" }, @@ -881,8 +872,8 @@ "source": [ "model.compile(\n", " loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", - " optimizer=keras.optimizers.experimental.AdamW(5e-5),\n", - " metrics=keras.metrics.SparseCategoricalAccuracy(),\n", + " optimizer=keras.optimizers.AdamW(5e-5),\n", + " metrics=[keras.metrics.SparseCategoricalAccuracy()],\n", " jit_compile=True,\n", ")\n", "model.fit(\n", @@ -893,7 +884,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text" @@ -934,4 +924,4 @@ }, "nbformat": 4, "nbformat_minor": 0 -} +} \ No newline at end of file diff --git a/guides/keras_nlp/getting_started.py b/guides/keras_nlp/getting_started.py index b75cb85e6e..22dd256179 100644 --- a/guides/keras_nlp/getting_started.py +++ b/guides/keras_nlp/getting_started.py @@ -12,14 +12,19 @@ KerasNLP is a natural language processing library that supports users through their entire development cycle. Our workflows are built from modular components that have state-of-the-art preset weights and architectures when used -out-of-the-box and are easily customizable when more control is needed. We -emphasize in-graph computation for all workflows so that developers can expect -easy productionization using the TensorFlow ecosystem. +out-of-the-box and are easily customizable when more control is needed. This library is an extension of the core Keras API; all high-level modules are [`Layers`](/api/layers/) or [`Models`](/api/models/). If you are familiar with Keras, congratulations! You already understand most of KerasNLP. +KerasNLP uses the [Keras Core](https://keras.io/keras_core/) library to work +with any of TensorFlow, Pytorch and Jax. In the guide below, we will use the +`jax` backend for training our models, and [tf.data](https://www.tensorflow.org/guide/data) +for efficiently running our input preprocessing. But feel free to mix things up! +This guide runs in TensorFlow or PyTorch backends with zero changes, simply update +the `KERAS_BACKEND` below. + This guide demonstrates our modular approach using a sentiment analysis example at six levels of complexity: @@ -37,15 +42,15 @@ """ """shell -pip install -q --upgrade keras-nlp tensorflow +pip install -q --upgrade keras-nlp """ -import keras_nlp -import tensorflow as tf -from tensorflow import keras +import os -# Use mixed precision for optimal performance -keras.mixed_precision.set_global_policy("mixed_float16") +os.environ["KERAS_BACKEND"] = "jax" # or "tensorflow" or "torch" + +import keras_nlp +import keras_core as keras """ ## API quickstart @@ -56,7 +61,7 @@ modules: * **Tokenizer**: `keras_nlp.models.XXTokenizer` - * **What it does**: Converts strings to `tf.RaggedTensor`s of token ids. + * **What it does**: Converts strings to sequences of token ids. * **Why it's important**: The raw bytes of a string are too high dimensional to be useful features so we first map them to a small number of tokens, for example `"The quick brown fox"` to `["the", "qu", "##ick", "br", "##own", "fox"]`. @@ -115,11 +120,11 @@ """ BATCH_SIZE = 16 -imdb_train = tf.keras.utils.text_dataset_from_directory( +imdb_train = keras.utils.text_dataset_from_directory( "aclImdb/train", batch_size=BATCH_SIZE, ) -imdb_test = tf.keras.utils.text_dataset_from_directory( +imdb_test = keras.utils.text_dataset_from_directory( "aclImdb/test", batch_size=BATCH_SIZE, ) @@ -231,17 +236,24 @@ In this workflow we train the model over three epochs using `tf.data.Dataset.cache()`, which computes the preprocessing once and caches the result before fitting begins. -**Note:** this code only works if your data fits in memory. If not, pass a `filename` to -`cache()`. +**Note:** we can use `tf.data` for preprocessing while running on the +Jax or PyTorch backend. The input dataset will automatically be converted to +backend native tensor types during fit. In fact, given the efficiency of `tf.data` +for running preprocessing, this is good practice on all backends. """ +import tensorflow as tf + preprocessor = keras_nlp.models.BertPreprocessor.from_preset( "bert_tiny_en_uncased", sequence_length=512, ) + # Apply the preprocessor to every sample of train and test data using `map()`. # `tf.data.AUTOTUNE` and `prefetch()` are options to tune performance, see # https://www.tensorflow.org/guide/data_performance for details. + +# Note: only call `cache()` if you training data fits in CPU memory! imdb_train_cached = ( imdb_train.map(preprocessor, tf.data.AUTOTUNE).cache().prefetch(tf.data.AUTOTUNE) ) @@ -273,7 +285,9 @@ constructor to get the vocabulary matching pretraining. **Note:** `BertTokenizer` does not pad sequences by default, so the output is -a `tf.RaggedTensor`. +ragged (each sequence has varying length). The `MultiSegmentPacker` below +handles padding these ragged sequences to dense tensor types (e.g. `tf.Tensor` +or `torch.Tensor`). """ tokenizer = keras_nlp.models.BertTokenizer.from_preset("bert_tiny_en_uncased") @@ -356,8 +370,8 @@ def preprocessor(x, y): model = keras.Model(inputs, outputs) model.compile( loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), - optimizer=keras.optimizers.experimental.AdamW(5e-5), - metrics=keras.metrics.SparseCategoricalAccuracy(), + optimizer=keras.optimizers.AdamW(5e-5), + metrics=[keras.metrics.SparseCategoricalAccuracy()], jit_compile=True, ) model.summary() @@ -467,10 +481,10 @@ def preprocess(inputs, label): ) inputs = { - "token_ids": keras.Input(shape=(None,), dtype=tf.int32), - "segment_ids": keras.Input(shape=(None,), dtype=tf.int32), - "padding_mask": keras.Input(shape=(None,), dtype=tf.int32), - "mask_positions": keras.Input(shape=(None,), dtype=tf.int32), + "token_ids": keras.Input(shape=(None,), dtype=tf.int32, name="token_ids"), + "segment_ids": keras.Input(shape=(None,), dtype=tf.int32, name="segment_ids"), + "padding_mask": keras.Input(shape=(None,), dtype=tf.int32, name="padding_mask"), + "mask_positions": keras.Input(shape=(None,), dtype=tf.int32, name="mask_positions"), } # Encoded token sequence @@ -486,8 +500,8 @@ def preprocess(inputs, label): pretraining_model.summary() pretraining_model.compile( loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), - optimizer=keras.optimizers.experimental.AdamW(learning_rate=5e-4), - weighted_metrics=keras.metrics.SparseCategoricalAccuracy(), + optimizer=keras.optimizers.AdamW(learning_rate=5e-4), + weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()], jit_compile=True, ) @@ -597,8 +611,8 @@ def preprocess(x, y): model.compile( loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), - optimizer=keras.optimizers.experimental.AdamW(5e-5), - metrics=keras.metrics.SparseCategoricalAccuracy(), + optimizer=keras.optimizers.AdamW(5e-5), + metrics=[keras.metrics.SparseCategoricalAccuracy()], jit_compile=True, ) model.fit( diff --git a/guides/md/keras_nlp/getting_started.md b/guides/md/keras_nlp/getting_started.md index 9fe3a896d9..9d1d4c01a6 100644 --- a/guides/md/keras_nlp/getting_started.md +++ b/guides/md/keras_nlp/getting_started.md @@ -16,14 +16,19 @@ KerasNLP is a natural language processing library that supports users through their entire development cycle. Our workflows are built from modular components that have state-of-the-art preset weights and architectures when used -out-of-the-box and are easily customizable when more control is needed. We -emphasize in-graph computation for all workflows so that developers can expect -easy productionization using the TensorFlow ecosystem. +out-of-the-box and are easily customizable when more control is needed. This library is an extension of the core Keras API; all high-level modules are [`Layers`](/api/layers/) or [`Models`](/api/models/). If you are familiar with Keras, congratulations! You already understand most of KerasNLP. +KerasNLP uses the [Keras Core](https://keras.io/keras_core/) library to work +with any of TensorFlow, Pytorch and Jax. In the guide below, we will use the +`jax` backend for training our models, and [tf.data](https://www.tensorflow.org/guide/data) +for efficiently running our input preprocessing. But feel free to mix things up! +This guide runs in TensorFlow or PyTorch backends with zero changes, simply update +the `KERAS_BACKEND` below. + This guide demonstrates our modular approach using a sentiment analysis example at six levels of complexity: @@ -41,23 +46,22 @@ reference for the complexity of the material: ```python -!pip install -q --upgrade keras-nlp tensorflow +!pip install -q --upgrade keras-nlp ``` + ```python -import keras_nlp -import tensorflow as tf -from tensorflow import keras +import os + +os.environ["KERAS_BACKEND"] = "jax" # or "tensorflow" or "torch" -# Use mixed precision for optimal performance -keras.mixed_precision.set_global_policy("mixed_float16") +import keras_nlp +import keras_core as keras ``` +
``` -/bin/bash: /home/haifengj/miniconda3/lib/libtinfo.so.6: no version information available (required by /bin/bash) - -INFO:tensorflow:Mixed precision compatibility check (mixed_float16): OK -Your GPU will likely run quickly with dtype policy mixed_float16 as it has compute capability of at least 7.0. Your GPU: Tesla V100-SXM2-16GB, compute capability 7.0 +Using JAX backend. ```
@@ -70,7 +74,7 @@ task-specific output. For each `XX` architecture (e.g., `Bert`), we offer the fo modules: * **Tokenizer**: `keras_nlp.models.XXTokenizer` - * **What it does**: Converts strings to `tf.RaggedTensor`s of token ids. + * **What it does**: Converts strings to sequences of token ids. * **Why it's important**: The raw bytes of a string are too high dimensional to be useful features so we first map them to a small number of tokens, for example `"The quick brown fox"` to `["the", "qu", "##ick", "br", "##own", "fox"]`. @@ -129,11 +133,11 @@ powerful `tf.data.Dataset` format for examples. ```python BATCH_SIZE = 16 -imdb_train = tf.keras.utils.text_dataset_from_directory( +imdb_train = keras.utils.text_dataset_from_directory( "aclImdb/train", batch_size=BATCH_SIZE, ) -imdb_test = tf.keras.utils.text_dataset_from_directory( +imdb_test = keras.utils.text_dataset_from_directory( "aclImdb/test", batch_size=BATCH_SIZE, ) @@ -141,20 +145,17 @@ imdb_test = tf.keras.utils.text_dataset_from_directory( # Inspect first review # Format is (review text tensor, label tensor) print(imdb_train.unbatch().take(1).get_single_element()) + ```
``` -/bin/bash: /home/haifengj/miniconda3/lib/libtinfo.so.6: no version information available (required by /bin/bash) % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed -100 80.2M 100 80.2M 0 0 56.7M 0 0:00:01 0:00:01 --:--:-- 56.7M -/bin/bash: /home/haifengj/miniconda3/lib/libtinfo.so.6: no version information available (required by /bin/bash) -/bin/bash: /home/haifengj/miniconda3/lib/libtinfo.so.6: no version information available (required by /bin/bash) -/bin/bash: /home/haifengj/miniconda3/lib/libtinfo.so.6: no version information available (required by /bin/bash) +100 80.2M 100 80.2M 0 0 3709k 0 0:00:22 0:00:22 --:--:-- 4677k Found 25000 files belonging to 2 classes. Found 25000 files belonging to 2 classes. -(
I truly think this is the best part of this stream of "educational cartoons". I do remember you can find little books and a plastic body in several parts: skin, skeleton, and of course: organs.

In the same stream, you\'ll find: "Il \xc3\xa9tait une fois l\'homme" which relate the human History from the big bang to the 20th century. There is: "Il \xc3\xa9tait une fois l\'espace" as well (about the space and its exploration) but that one is more a fiction than a description of the reality since it takes place in the future.'>, ) +(
Yet this heart of flint is about to melt. London children are evacuated in advance of the blitz. Young William (Willie) Beech is billeted with the protesting Tom. Willie is played to good effect by Nick Robinson.

This boy is in need of care with a capital C. Behind in school, still wetting the bed, and unable to read are the smallest of his problems. He comes from a horrific background in London, with a mother who cannot cope, to put it mildly.

Slowly, yet steadily, man and boy warm to each other. Tom discovers again his ability to love and care. And the boy learns to accept this love and caring. See Tom and Willie building a bomb shelter at the end of their garden. See Willie\'s joy at what is probably his first ever birthday party thrown by Tom.

Not to give away the ending, but Willie is adopted by Tom after much struggle, and the pair begin a new life much richer for their mutual love.

In this movie, Thaw and Robinson are following in a long line of movies where man meets boy and develop a mutual love. See the late Dirk Bogarde and Jon Whiteley in "Spanish Gardener". Or Clark Gable and Carlo Angeletti in "It Started in Naples". Or Robert Ulrich and Kenny Vadas in "Captains Courageous". Or Mel Gibson and Nick Stahl in "Man Without a Face".

Two points of interest. This is the only appearance of Thaw that I know of where he sings. Only a verse of a hymn, New Jerusalem, but he does sing.

Second, young Robinson also starred in a second movie featuring "Tom" in the title, "Tom\'s Midnight Garden", which is based on a classic children\'s novel.'>, ) ```
@@ -179,12 +180,9 @@ classifier.predict(["I love modular workflows in keras-nlp!"])
``` -WARNING:tensorflow:From /home/haifengj/miniconda3/lib/python3.10/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23. -Instructions for updating: -Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089 -1/1 [==============================] - 3s 3s/step + 1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 882ms/step -array([[-1.539, 1.542]], dtype=float16) +array([[-1.5376465, 1.5407037]], dtype=float32) ```
@@ -212,9 +210,9 @@ classifier.evaluate(imdb_test)
``` -1563/1563 [==============================] - 42s 25ms/step - loss: 0.4630 - sparse_categorical_accuracy: 0.7835 + 1563/1563 ━━━━━━━━━━━━━━━━━━━━ 6s 4ms/step - loss: 0.4566 - sparse_categorical_accuracy: 0.7885 -[0.4629528820514679, 0.7834799885749817] +[0.46291637420654297, 0.7834799885749817] ```
@@ -256,9 +254,9 @@ classifier.fit(
``` -1563/1563 [==============================] - 294s 179ms/step - loss: 0.4203 - sparse_categorical_accuracy: 0.8024 - val_loss: 0.3077 - val_sparse_categorical_accuracy: 0.8700 + 1563/1563 ━━━━━━━━━━━━━━━━━━━━ 19s 11ms/step - loss: 0.5128 - sparse_categorical_accuracy: 0.7350 - val_loss: 0.2974 - val_sparse_categorical_accuracy: 0.8746 - + ```
@@ -286,18 +284,25 @@ matching **preprocessor** as the **task**. In this workflow we train the model over three epochs using `tf.data.Dataset.cache()`, which computes the preprocessing once and caches the result before fitting begins. -**Note:** this code only works if your data fits in memory. If not, pass a `filename` to -`cache()`. +**Note:** we can use `tf.data` for preprocessing while running on the +Jax or PyTorch backend. The input dataset will automatically be converted to +backend native tensor types during fit. In fact, given the efficiency of `tf.data` +for running preprocessing, this is good practice on all backends. ```python +import tensorflow as tf + preprocessor = keras_nlp.models.BertPreprocessor.from_preset( "bert_tiny_en_uncased", sequence_length=512, ) + # Apply the preprocessor to every sample of train and test data using `map()`. # `tf.data.AUTOTUNE` and `prefetch()` are options to tune performance, see # https://www.tensorflow.org/guide/data_performance for details. + +# Note: only call `cache()` if you training data fits in CPU memory! imdb_train_cached = ( imdb_train.map(preprocessor, tf.data.AUTOTUNE).cache().prefetch(tf.data.AUTOTUNE) ) @@ -306,9 +311,7 @@ imdb_test_cached = ( ) classifier = keras_nlp.models.BertClassifier.from_preset( - "bert_tiny_en_uncased", - preprocessor=None, - num_classes=2 + "bert_tiny_en_uncased", preprocessor=None, num_classes=2 ) classifier.fit( imdb_train_cached, @@ -320,13 +323,13 @@ classifier.fit(
``` Epoch 1/3 -1563/1563 [==============================] - 262s 159ms/step - loss: 0.4221 - sparse_categorical_accuracy: 0.8002 - val_loss: 0.3077 - val_sparse_categorical_accuracy: 0.8699 + 1563/1563 ━━━━━━━━━━━━━━━━━━━━ 18s 11ms/step - loss: 0.5338 - sparse_categorical_accuracy: 0.7117 - val_loss: 0.3015 - val_sparse_categorical_accuracy: 0.8737 Epoch 2/3 -1563/1563 [==============================] - 225s 144ms/step - loss: 0.2673 - sparse_categorical_accuracy: 0.8923 - val_loss: 0.2935 - val_sparse_categorical_accuracy: 0.8783 + 1563/1563 ━━━━━━━━━━━━━━━━━━━━ 15s 9ms/step - loss: 0.2855 - sparse_categorical_accuracy: 0.8829 - val_loss: 0.3053 - val_sparse_categorical_accuracy: 0.8771 Epoch 3/3 -1563/1563 [==============================] - 225s 144ms/step - loss: 0.1974 - sparse_categorical_accuracy: 0.9271 - val_loss: 0.3418 - val_sparse_categorical_accuracy: 0.8686 + 1563/1563 ━━━━━━━━━━━━━━━━━━━━ 15s 9ms/step - loss: 0.2094 - sparse_categorical_accuracy: 0.9215 - val_loss: 0.3238 - val_sparse_categorical_accuracy: 0.8756 - + ```
@@ -342,7 +345,9 @@ In cases where custom preprocessing is required, we offer direct access to the constructor to get the vocabulary matching pretraining. **Note:** `BertTokenizer` does not pad sequences by default, so the output is -a `tf.RaggedTensor`. +ragged (each sequence has varying length). The `MultiSegmentPacker` below +handles padding these ragged sequences to dense tensor types (e.g. `tf.Tensor` +or `torch.Tensor`). ```python @@ -386,13 +391,13 @@ print(imdb_train_preprocessed.unbatch().take(1).get_single_element())
``` ({'token_ids': , 'segment_ids': Model: "functional_1" + + + + + +
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
+┃ Layer (type)         Output Shape       Param #  Connected to         ┃
+┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
+│ padding_mask        │ (None, None)      │       0 │ -                    │
+│ (InputLayer)        │                   │         │                      │
+├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
+│ segment_ids         │ (None, None)      │       0 │ -                    │
+│ (InputLayer)        │                   │         │                      │
+├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
+│ token_ids           │ (None, None)      │       0 │ -                    │
+│ (InputLayer)        │                   │         │                      │
+├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
+│ bert_backbone_3     │ [(None, 128),     │ 4,385,… │ padding_mask[0][0],  │
+│ (BertBackbone)      │ (None, None,      │         │ segment_ids[0][0],   │
+│                     │ 128)]             │         │ token_ids[0][0]      │
+├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
+│ transformer_encoder │ (None, None, 128) │ 198,272 │ bert_backbone_3[0][ │
+│ (TransformerEncode… │                   │         │                      │
+├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
+│ transformer_encode… │ (None, None, 128) │ 198,272 │ transformer_encoder… │
+│ (TransformerEncode… │                   │         │                      │
+├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
+│ get_item_4          │ (None, 128)       │       0 │ transformer_encoder… │
+│ (GetItem)           │                   │         │                      │
+├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
+│ dense_20 (Dense)    │ (None, 2)         │     258 │ get_item_4[0][0]     │
+└─────────────────────┴───────────────────┴─────────┴──────────────────────┘
+
+ + + + +
 Total params: 4,782,722 (145.96 MB)
+
+ + + + +
 Trainable params: 396,802 (12.11 MB)
+
+ + + + +
 Non-trainable params: 4,385,920 (133.85 MB)
+
+ + +
``` -Model: "model" -__________________________________________________________________________________________________ - Layer (type) Output Shape Param # Connected to -================================================================================================== - padding_mask (InputLayer) [(None, None)] 0 [] - - segment_ids (InputLayer) [(None, None)] 0 [] - - token_ids (InputLayer) [(None, None)] 0 [] - - bert_backbone_3 (BertBackbone) {'sequence_output': 4385920 ['padding_mask[0][0]', - (None, None, 128), 'segment_ids[0][0]', - 'pooled_output': ( 'token_ids[0][0]'] - None, 128)} - - transformer_encoder (Transform (None, None, 128) 198272 ['bert_backbone_3[0][1]'] - erEncoder) - - transformer_encoder_1 (Transfo (None, None, 128) 198272 ['transformer_encoder[0][0]'] - rmerEncoder) - - tf.__operators__.getitem_4 (Sl (None, 128) 0 ['transformer_encoder_1[0][0]'] - icingOpLambda) - - dense (Dense) (None, 2) 258 ['tf.__operators__.getitem_4[0][0 - ]'] - -================================================================================================== -Total params: 4,782,722 -Trainable params: 396,802 -Non-trainable params: 4,385,920 -__________________________________________________________________________________________________ Epoch 1/3 -1563/1563 [==============================] - 50s 23ms/step - loss: 0.5825 - sparse_categorical_accuracy: 0.6916 - val_loss: 0.5144 - val_sparse_categorical_accuracy: 0.7460 + 1563/1563 ━━━━━━━━━━━━━━━━━━━━ 23s 14ms/step - loss: 0.6078 - sparse_categorical_accuracy: 0.6726 - val_loss: 0.5193 - val_sparse_categorical_accuracy: 0.7432 Epoch 2/3 -1563/1563 [==============================] - 15s 10ms/step - loss: 0.4842 - sparse_categorical_accuracy: 0.7655 - val_loss: 0.4286 - val_sparse_categorical_accuracy: 0.8025 + 1563/1563 ━━━━━━━━━━━━━━━━━━━━ 19s 12ms/step - loss: 0.5087 - sparse_categorical_accuracy: 0.7498 - val_loss: 0.4267 - val_sparse_categorical_accuracy: 0.8032 Epoch 3/3 -1563/1563 [==============================] - 15s 10ms/step - loss: 0.4409 - sparse_categorical_accuracy: 0.7968 - val_loss: 0.4084 - val_sparse_categorical_accuracy: 0.8145 + 1563/1563 ━━━━━━━━━━━━━━━━━━━━ 19s 12ms/step - loss: 0.4424 - sparse_categorical_accuracy: 0.7942 - val_loss: 0.3937 - val_sparse_categorical_accuracy: 0.8229 - + ```
@@ -593,31 +622,31 @@ print(pretrain_ds.unbatch().take(1).get_single_element())
``` ({'token_ids': , 'mask_positions': }, , }, , ) + 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., + 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., + 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)>) ```
@@ -702,10 +728,10 @@ mlm_head = keras_nlp.layers.MaskedLMHead( ) inputs = { - "token_ids": keras.Input(shape=(None,), dtype=tf.int32), - "segment_ids": keras.Input(shape=(None,), dtype=tf.int32), - "padding_mask": keras.Input(shape=(None,), dtype=tf.int32), - "mask_positions": keras.Input(shape=(None,), dtype=tf.int32), + "token_ids": keras.Input(shape=(None,), dtype=tf.int32, name="token_ids"), + "segment_ids": keras.Input(shape=(None,), dtype=tf.int32, name="segment_ids"), + "padding_mask": keras.Input(shape=(None,), dtype=tf.int32, name="padding_mask"), + "mask_positions": keras.Input(shape=(None,), dtype=tf.int32, name="mask_positions"), } # Encoded token sequence @@ -721,8 +747,8 @@ pretraining_model = keras.Model(inputs, outputs) pretraining_model.summary() pretraining_model.compile( loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), - optimizer=keras.optimizers.experimental.AdamW(learning_rate=5e-4), - weighted_metrics=keras.metrics.SparseCategoricalAccuracy(), + optimizer=keras.optimizers.AdamW(learning_rate=5e-4), + weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()], jit_compile=True, ) @@ -734,46 +760,68 @@ pretraining_model.fit( ) ``` + +
Model: "functional_3"
+
+ + + + +
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
+┃ Layer (type)         Output Shape       Param #  Connected to         ┃
+┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
+│ mask_positions      │ (None, None)      │       0 │ -                    │
+│ (InputLayer)        │                   │         │                      │
+├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
+│ padding_mask        │ (None, None)      │       0 │ -                    │
+│ (InputLayer)        │                   │         │                      │
+├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
+│ segment_ids         │ (None, None)      │       0 │ -                    │
+│ (InputLayer)        │                   │         │                      │
+├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
+│ token_ids           │ (None, None)      │       0 │ -                    │
+│ (InputLayer)        │                   │         │                      │
+├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
+│ bert_backbone_4     │ [(None, 128),     │ 4,385,… │ mask_positions[0][0… │
+│ (BertBackbone)      │ (None, None,      │         │ padding_mask[0][0],  │
+│                     │ 128)]             │         │ segment_ids[0][0],   │
+│                     │                   │         │ token_ids[0][0]      │
+├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
+│ masked_lm_head      │ (None, 30522)     │ 3,954,… │ bert_backbone_4[0][ │
+│ (MaskedLMHead)      │                   │         │ mask_positions[0][0] │
+└─────────────────────┴───────────────────┴─────────┴──────────────────────┘
+
+ + + + +
 Total params: 4,433,210 (135.29 MB)
+
+ + + + +
 Trainable params: 4,433,210 (135.29 MB)
+
+ + + + +
 Non-trainable params: 0 (0.00 B)
+
+ + +
``` -/home/haifengj/miniconda3/lib/python3.10/site-packages/keras/engine/functional.py:638: UserWarning: Input dict contained keys ['mask_positions'] which did not match any model input. They will be ignored by the model. - inputs = self._flatten_to_reference_inputs(inputs) - -Model: "model_1" -__________________________________________________________________________________________________ - Layer (type) Output Shape Param # Connected to -================================================================================================== - input_4 (InputLayer) [(None, None)] 0 [] - - input_3 (InputLayer) [(None, None)] 0 [] - - input_2 (InputLayer) [(None, None)] 0 [] - - input_1 (InputLayer) [(None, None)] 0 [] - - bert_backbone_4 (BertBackbone) {'sequence_output': 4385920 ['input_4[0][0]', - (None, None, 128), 'input_3[0][0]', - 'pooled_output': ( 'input_2[0][0]', - None, 128)} 'input_1[0][0]'] - - masked_lm_head (MaskedLMHead) (None, None, 30522) 3954106 ['bert_backbone_4[0][1]', - 'input_4[0][0]'] - -================================================================================================== -Total params: 4,433,210 -Trainable params: 4,433,210 -Non-trainable params: 0 -__________________________________________________________________________________________________ Epoch 1/3 -WARNING:tensorflow:Gradients do not exist for variables ['pooled_dense/kernel:0', 'pooled_dense/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument? -WARNING:tensorflow:Gradients do not exist for variables ['pooled_dense/kernel:0', 'pooled_dense/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument? -1563/1563 [==============================] - 103s 57ms/step - loss: 5.2620 - sparse_categorical_accuracy: 0.0866 - val_loss: 4.9799 - val_sparse_categorical_accuracy: 0.1172 + 1563/1563 ━━━━━━━━━━━━━━━━━━━━ 21s 12ms/step - loss: 5.6220 - sparse_categorical_accuracy: 0.0615 - val_loss: 4.9762 - val_sparse_categorical_accuracy: 0.1155 Epoch 2/3 -1563/1563 [==============================] - 77s 49ms/step - loss: 4.9584 - sparse_categorical_accuracy: 0.1241 - val_loss: 4.8639 - val_sparse_categorical_accuracy: 0.1327 + 1563/1563 ━━━━━━━━━━━━━━━━━━━━ 16s 10ms/step - loss: 4.9844 - sparse_categorical_accuracy: 0.1214 - val_loss: 4.8706 - val_sparse_categorical_accuracy: 0.1321 Epoch 3/3 -1563/1563 [==============================] - 77s 49ms/step - loss: 4.7992 - sparse_categorical_accuracy: 0.1480 - val_loss: 4.5584 - val_sparse_categorical_accuracy: 0.1919 + 1563/1563 ━━━━━━━━━━━━━━━━━━━━ 16s 10ms/step - loss: 4.8614 - sparse_categorical_accuracy: 0.1385 - val_loss: 4.4897 - val_sparse_categorical_accuracy: 0.2069 - + ```
@@ -843,63 +891,63 @@ print(imdb_preproc_train_ds.unbatch().take(1).get_single_element())
``` (, ) ``` @@ -933,42 +981,58 @@ model = keras.Model( model.summary() ``` -
-``` -Model: "model_2" -_________________________________________________________________ - Layer (type) Output Shape Param # -================================================================= - token_ids (InputLayer) [(None, None)] 0 - - token_and_position_embeddin (None, None, 64) 1259648 - g (TokenAndPositionEmbeddin - g) - - transformer_encoder_2 (Tran (None, None, 64) 33472 - sformerEncoder) - - tf.__operators__.getitem_6 (None, 64) 0 - (SlicingOpLambda) - - dense_1 (Dense) (None, 2) 130 - -================================================================= -Total params: 1,293,250 -Trainable params: 1,293,250 -Non-trainable params: 0 -_________________________________________________________________ -``` -
+
Model: "functional_5"
+
+ + + + +
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
+┃ Layer (type)                     Output Shape                  Param # ┃
+┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
+│ token_ids (InputLayer)          │ (None, None)              │          0 │
+├─────────────────────────────────┼───────────────────────────┼────────────┤
+│ token_and_position_embedding    │ (None, None, 64)          │  1,259,648 │
+│ (TokenAndPositionEmbedding)     │                           │            │
+├─────────────────────────────────┼───────────────────────────┼────────────┤
+│ transformer_encoder_2           │ (None, None, 64)          │     33,472 │
+│ (TransformerEncoder)            │                           │            │
+├─────────────────────────────────┼───────────────────────────┼────────────┤
+│ get_item_6 (GetItem)            │ (None, 64)                │          0 │
+├─────────────────────────────────┼───────────────────────────┼────────────┤
+│ dense_28 (Dense)                │ (None, 2)                 │        130 │
+└─────────────────────────────────┴───────────────────────────┴────────────┘
+
+ + + + +
 Total params: 1,293,250 (39.47 MB)
+
+ + + + +
 Trainable params: 1,293,250 (39.47 MB)
+
+ + + + +
 Non-trainable params: 0 (0.00 B)
+
+ + + ### Train the transformer directly on the classification objective ```python model.compile( loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), - optimizer=keras.optimizers.experimental.AdamW(5e-5), - metrics=keras.metrics.SparseCategoricalAccuracy(), + optimizer=keras.optimizers.AdamW(5e-5), + metrics=[keras.metrics.SparseCategoricalAccuracy()], jit_compile=True, ) model.fit( @@ -981,13 +1045,13 @@ model.fit(
``` Epoch 1/3 -1563/1563 [==============================] - 128s 77ms/step - loss: 0.6113 - sparse_categorical_accuracy: 0.6411 - val_loss: 0.4020 - val_sparse_categorical_accuracy: 0.8279 + 1563/1563 ━━━━━━━━━━━━━━━━━━━━ 7s 4ms/step - loss: 0.6688 - sparse_categorical_accuracy: 0.5758 - val_loss: 0.3674 - val_sparse_categorical_accuracy: 0.8507 Epoch 2/3 -1563/1563 [==============================] - 117s 75ms/step - loss: 0.3117 - sparse_categorical_accuracy: 0.8729 - val_loss: 0.3062 - val_sparse_categorical_accuracy: 0.8786 + 1563/1563 ━━━━━━━━━━━━━━━━━━━━ 5s 3ms/step - loss: 0.3126 - sparse_categorical_accuracy: 0.8725 - val_loss: 0.3138 - val_sparse_categorical_accuracy: 0.8729 Epoch 3/3 -1563/1563 [==============================] - 135s 87ms/step - loss: 0.2381 - sparse_categorical_accuracy: 0.9066 - val_loss: 0.3113 - val_sparse_categorical_accuracy: 0.8734 + 1563/1563 ━━━━━━━━━━━━━━━━━━━━ 5s 3ms/step - loss: 0.2226 - sparse_categorical_accuracy: 0.9151 - val_loss: 0.4513 - val_sparse_categorical_accuracy: 0.8125 - + ```