Skip to content

Commit

Permalink
Use hermetic python for tf-text.
Browse files Browse the repository at this point in the history
FUTURE_COPYBARA_INTEGRATE_REVIEW=#1273 from jiya-zhang:master 1d507c9
PiperOrigin-RevId: 634914802
  • Loading branch information
cantonios authored and tf-text-github-robot committed May 18, 2024
1 parent f0f675c commit 082c257
Show file tree
Hide file tree
Showing 11 changed files with 743 additions and 98 deletions.
58 changes: 53 additions & 5 deletions WORKSPACE
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,24 @@ workspace(name = "org_tensorflow_text")

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
name = "bazel_skylib",
sha256 = "74d544d96f4a5bb630d465ca8bbcfe231e3594e5aae57e1edbf17a6eb3ca2506",
urls = [
"https://storage.googleapis.com/mirror.tensorflow.org/github.com/bazelbuild/bazel-skylib/releases/download/1.3.0/bazel-skylib-1.3.0.tar.gz",
"https://github.com/bazelbuild/bazel-skylib/releases/download/1.3.0/bazel-skylib-1.3.0.tar.gz",
],
)

http_archive(
name = "rules_python",
sha256 = "29a801171f7ca190c543406f9894abf2d483c206e14d6acbd695623662320097",
strip_prefix = "rules_python-0.18.1",
url = "https://github.com/bazelbuild/rules_python/releases/download/0.18.1/rules_python-0.18.1.tar.gz",
)

# load("@rules_python//python:repositories.bzl", "python_register_toolchains")

http_archive(
name = "icu",
strip_prefix = "icu-release-64-2",
Expand Down Expand Up @@ -56,12 +74,10 @@ http_archive(

http_archive(
name = "org_tensorflow",
patch_args = ["-p1"],
patches = ["//third_party/tensorflow:tf.patch"],
strip_prefix = "tensorflow-d17c801006947b240ec4b8caf232c39b6a24718a",
sha256 = "1a32ed7b5ea090db114008ea382c1e1beda622ffd4c62582f2f906cb10ee6290",
strip_prefix = "tensorflow-f6b72954734f8304bfb83228bd8406a3ba3394f4",
sha256 = "15df197aace44fe2c67e6e22f930cf76f45d9e6ac1291e7c9ce8dd0dcc26e9a5",
urls = [
"https://github.com/tensorflow/tensorflow/archive/d17c801006947b240ec4b8caf232c39b6a24718a.zip"
"https://github.com/tensorflow/tensorflow/archive/f6b72954734f8304bfb83228bd8406a3ba3394f4.zip"
],
)

Expand All @@ -85,6 +101,38 @@ http_archive(
build_file = "//third_party/pybind11:BUILD.bzl",
)

# We must initialize hermetic Python first.
load("@org_tensorflow//third_party/py:python_init_rules.bzl", "python_init_rules")
python_init_rules()

load("@org_tensorflow//third_party/py:python_init_repositories.bzl", "python_init_repositories")
python_init_repositories(
requirements = {
"3.11": "//oss_scripts/requirements:python_requirements.txt",
},
)

load("@org_tensorflow//third_party/py:python_init_toolchains.bzl", "python_init_toolchains")
python_init_toolchains()

load("@org_tensorflow//third_party/py:python_init_pip.bzl", "python_init_pip")
python_init_pip()

# load("@pypi//:requirements.bzl", "install_deps")
# install_deps()

# Read the Python package dependencies of the build environment. To modify
# them, see //third_party:python_requirements.in.
load("@rules_python//python:pip.bzl", "pip_parse")
pip_parse(
name = "tensorflow_text_pip_deps",
requirements_lock = "//oss_scripts/requirements:python_requirements.txt",
)

# Create repositories for each Python package dependency.
load("@tensorflow_text_pip_deps//:requirements.bzl", "install_deps")
install_deps()

# Initialize TensorFlow dependencies.
load("@org_tensorflow//tensorflow:workspace3.bzl", "tf_workspace3")
tf_workspace3()
Expand Down
38 changes: 19 additions & 19 deletions docs/tutorials/transformer.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@
"source": [
"This tutorial demonstrates how to create and train a [sequence-to-sequence](https://developers.google.com/machine-learning/glossary#sequence-to-sequence-task) [Transformer](https://developers.google.com/machine-learning/glossary#Transformer) model to translate [Portuguese into English](https://www.tensorflow.org/datasets/catalog/ted_hrlr_translate#ted_hrlr_translatept_to_en). The Transformer was originally proposed in [\"Attention is all you need\"](https://arxiv.org/abs/1706.03762) by Vaswani et al. (2017).\n",
"\n",
"Transformers are deep neural networks that replace CNNs and RNNs with [self-attention](https://developers.google.com/machine-learning/glossary#self-attention). Self attention allows Transformers to easily transmit information across the input sequences.\n",
"Transformers are deep neural networks that replace CNNs and RNNs with [self-attention](https://developers.google.com/machine-learning/glossary#self-attention). Self-attention allows Transformers to easily transmit information across the input sequences.\n",
"\n",
"As explained in the [Google AI Blog post](https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html):\n",
"\n",
Expand Down Expand Up @@ -138,7 +138,7 @@
"To get the most out of this tutorial, it helps if you know about [the basics of text generation](./text_generation.ipynb) and attention mechanisms. \n",
"\n",
"A Transformer is a sequence-to-sequence encoder-decoder model similar to the model in the [NMT with attention tutorial](https://www.tensorflow.org/text/tutorials/nmt_with_attention).\n",
"A single-layer Transformer takes a little more code to write, but is almost identical to that encoder-decoder RNN model. The only difference is that the RNN layers are replaced with self attention layers.\n",
"A single-layer Transformer takes a little more code to write, but is almost identical to that encoder-decoder RNN model. The only difference is that the RNN layers are replaced with self-attention layers.\n",
"This tutorial builds a 4-layer Transformer which is larger and more powerful, but not fundamentally more complex."
]
},
Expand Down Expand Up @@ -186,8 +186,8 @@
"## Why Transformers are significant\n",
"\n",
"- Transformers excel at modeling sequential data, such as natural language.\n",
"- Unlike the [recurrent neural networks (RNNs)](./text_generation.ipynb), Transformers are parallelizable. This makes them efficient on hardware like GPUs and TPUs. The main reasons is that Transformers replaced recurrence with attention, and computations can happen simultaneously. Layer outputs can be computed in parallel, instead of a series like an RNN.\n",
"- Unlike [RNNs](https://www.tensorflow.org/guide/keras/rnn) (like [seq2seq, 2014](https://arxiv.org/abs/1409.3215)) or [convolutional neural networks (CNNs)](https://www.tensorflow.org/tutorials/images/cnn) (for example, [ByteNet](https://arxiv.org/abs/1610.10099)), Transformers are able to capture distant or long-range contexts and dependencies in the data between distant positions in the input or output sequences. Thus, longer connections can be learned. Attention allows each location to have access to the entire input at each layer, while in RNNs and CNNs, the information needs to pass through many processing steps to move a long distance, which makes it harder to learn.\n",
"- Unlike [recurrent neural networks (RNNs)](./text_generation.ipynb), Transformers are parallelizable. This makes them efficient on hardware like GPUs and TPUs. The main reasons is that Transformers replaced recurrence with attention, and computations can happen simultaneously. Layer outputs can be computed in parallel, instead of a series like an RNN.\n",
"- Unlike [RNNs](https://www.tensorflow.org/guide/keras/rnn) (such as [seq2seq, 2014](https://arxiv.org/abs/1409.3215)) or [convolutional neural networks (CNNs)](https://www.tensorflow.org/tutorials/images/cnn) (for example, [ByteNet](https://arxiv.org/abs/1610.10099)), Transformers are able to capture distant or long-range contexts and dependencies in the data between distant positions in the input or output sequences. Thus, longer connections can be learned. Attention allows each location to have access to the entire input at each layer, while in RNNs and CNNs, the information needs to pass through many processing steps to move a long distance, which makes it harder to learn.\n",
"- Transformers make no assumptions about the temporal/spatial relationships across the data. This is ideal for processing a set of objects (for example, [StarCraft units](https://www.deepmind.com/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii)).\n",
"\n",
"\u003cimg src=\"https://www.tensorflow.org/images/tutorials/transformer/encoder_self_attention_distribution.png\" width=\"800\" alt=\"Encoder self-attention distribution for the word it from the 5th to the 6th layer of a Transformer trained on English-to-French translation\"\u003e\n",
Expand Down Expand Up @@ -1007,8 +1007,8 @@
},
"outputs": [],
"source": [
"embed_pt = PositionalEmbedding(vocab_size=tokenizers.pt.get_vocab_size(), d_model=512)\n",
"embed_en = PositionalEmbedding(vocab_size=tokenizers.en.get_vocab_size(), d_model=512)\n",
"embed_pt = PositionalEmbedding(vocab_size=tokenizers.pt.get_vocab_size().numpy(), d_model=512)\n",
"embed_en = PositionalEmbedding(vocab_size=tokenizers.en.get_vocab_size().numpy(), d_model=512)\n",
"\n",
"pt_emb = embed_pt(pt)\n",
"en_emb = embed_en(en)"
Expand Down Expand Up @@ -1340,7 +1340,7 @@
"id": "J6qrQxSpv34R"
},
"source": [
"### The global self attention layer"
"### The global self-attention layer"
]
},
{
Expand All @@ -1360,7 +1360,7 @@
"source": [
"\u003ctable\u003e\n",
"\u003ctr\u003e\n",
" \u003cth colspan=1\u003eThe global self attention layer\u003c/th\u003e\n",
" \u003cth colspan=1\u003eThe global self-attention layer\u003c/th\u003e\n",
"\u003ctr\u003e\n",
"\u003ctr\u003e\n",
" \u003ctd\u003e\n",
Expand All @@ -1378,7 +1378,7 @@
"source": [
"Since the context sequence is fixed while the translation is being generated, information is allowed to flow in both directions. \n",
"\n",
"Before Transformers and self attention, models commonly used RNNs or CNNs to do this task:"
"Before Transformers and self-attention, models commonly used RNNs or CNNs to do this task:"
]
},
{
Expand Down Expand Up @@ -1415,7 +1415,7 @@
"- The RNN allows information to flow all the way across the sequence, but it passes through many processing steps to get there (limiting gradient flow). These RNN steps have to be run sequentially and so the RNN is less able to take advantage of modern parallel devices.\n",
"- In the CNN each location can be processed in parallel, but it only provides a limited receptive field. The receptive field only grows linearly with the number of CNN layers, You need to stack a number of Convolution layers to transmit information across the sequence ([Wavenet](https://arxiv.org/abs/1609.03499) reduces this problem by using dilated convolutions).\n",
"\n",
"The global self attention layer on the other hand lets every sequence element directly access every other sequence element, with only a few operations, and all the outputs can be computed in parallel. \n",
"The global self-attention layer on the other hand lets every sequence element directly access every other sequence element, with only a few operations, and all the outputs can be computed in parallel. \n",
"\n",
"To implement this layer you just need to pass the target sequence, `x`, as both the `query`, and `value` arguments to the `mha` layer: "
]
Expand Down Expand Up @@ -1470,7 +1470,7 @@
"source": [
"\u003ctable\u003e\n",
"\u003ctr\u003e\n",
" \u003cth colspan=1\u003eThe global self attention layer\u003c/th\u003e\n",
" \u003cth colspan=1\u003eThe global self-attention layer\u003c/th\u003e\n",
"\u003ctr\u003e\n",
"\u003ctr\u003e\n",
" \u003ctd\u003e\n",
Expand Down Expand Up @@ -1499,7 +1499,7 @@
"source": [
"\u003ctable\u003e\n",
"\u003ctr\u003e\n",
" \u003cth colspan=1\u003eThe global self attention layer\u003c/th\u003e\n",
" \u003cth colspan=1\u003eThe global self-attention layer\u003c/th\u003e\n",
"\u003ctr\u003e\n",
"\u003ctr\u003e\n",
" \u003ctd\u003e\n",
Expand All @@ -1515,7 +1515,7 @@
"id": "Yq4NtLymD99-"
},
"source": [
"### The causal self attention layer"
"### The causal self-attention layer"
]
},
{
Expand All @@ -1524,7 +1524,7 @@
"id": "VufkgF7caLze"
},
"source": [
"This layer does a similar job as the global self attention layer, for the output sequence:"
"This layer does a similar job as the global self-attention layer, for the output sequence:"
]
},
{
Expand All @@ -1535,7 +1535,7 @@
"source": [
"\u003ctable\u003e\n",
"\u003ctr\u003e\n",
" \u003cth colspan=1\u003eThe causal self attention layer\u003c/th\u003e\n",
" \u003cth colspan=1\u003eThe causal self-attention layer\u003c/th\u003e\n",
"\u003ctr\u003e\n",
"\u003ctr\u003e\n",
" \u003ctd\u003e\n",
Expand All @@ -1551,7 +1551,7 @@
"id": "0AtF1HYFEOYf"
},
"source": [
"This needs to be handled differently from the encoder's global self attention layer. \n",
"This needs to be handled differently from the encoder's global self-attention layer. \n",
"\n",
"Like the [text generation tutorial](https://www.tensorflow.org/text/tutorials/text_generation), and the [NMT with attention](https://www.tensorflow.org/text/tutorials/nmt_with_attention) tutorial, Transformers are an \"autoregressive\" model: They generate the text one token at a time and feed that output back to the input. To make this _efficient_, these models ensure that the output for each sequence element only depends on the previous sequence elements; the models are \"causal\"."
]
Expand Down Expand Up @@ -1608,7 +1608,7 @@
"id": "WLYfIa8eiYgk"
},
"source": [
"To build a causal self attention layer, you need to use an appropriate mask when computing the attention scores and summing the attention `value`s.\n",
"To build a causal self-attention layer, you need to use an appropriate mask when computing the attention scores and summing the attention `value`s.\n",
"\n",
"This is taken care of automatically if you pass `use_causal_mask = True` to the `MultiHeadAttention` layer when you call it:"
]
Expand Down Expand Up @@ -1650,7 +1650,7 @@
"source": [
"\u003ctable\u003e\n",
"\u003ctr\u003e\n",
" \u003cth colspan=1\u003eThe causal self attention layer\u003c/th\u003e\n",
" \u003cth colspan=1\u003eThe causal self-attention layer\u003c/th\u003e\n",
"\u003ctr\u003e\n",
"\u003ctr\u003e\n",
" \u003ctd\u003e\n",
Expand Down Expand Up @@ -1679,7 +1679,7 @@
"source": [
"\u003ctable\u003e\n",
"\u003c/tr\u003e\n",
" \u003cth colspan=1\u003eThe causal self attention layer\u003c/th\u003e\n",
" \u003cth colspan=1\u003eThe causal self-attention layer\u003c/th\u003e\n",
"\u003ctr\u003e\n",
"\u003ctr\u003e\n",
" \u003ctd\u003e\n",
Expand Down
1 change: 1 addition & 0 deletions oss_scripts/BUILD.oss
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
licenses(["notice"])
20 changes: 3 additions & 17 deletions oss_scripts/configure.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,23 +37,9 @@ function is_macos() {
# Remove .bazelrc if it already exist
[ -e .bazelrc ] && rm .bazelrc

if [[ $(pip show tensorflow) == *tensorflow* ]] ||
[[ $(pip show tensorflow-macos) == *tensorflow-macos* ]] ||
[[ $(pip show tf-nightly) == *tf-nightly* ]]; then
echo 'Using installed tensorflow.'
else
echo 'Installing tensorflow.'
if is_macos; then
# Only Apple Silicon will be installed with tensorflow-macos.
if [[ x"$(arch)" == x"arm64" ]]; then
pip install tensorflow-macos==2.13.0
else
pip install tensorflow==2.13.0
fi
else
pip install tensorflow==2.13.0
fi
fi
echo 'Installing tensorflow.'
# For main branch, install nightly. CHANGE THIS ON RELEASE BRANCHES.
pip install tf-nightly

if is_windows; then
# ICU must be built as a static library, so the external data must be built in
Expand Down
30 changes: 30 additions & 0 deletions oss_scripts/requirements/BUILD.oss
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
load("@rules_python//python:pip.bzl", "compile_pip_requirements")

licenses(["notice"])

compile_pip_requirements(
# Defines targets which use pip-compile to keep the Python locked
# requirements up-to-date:
#
# :python_requirements.update bazel run this target to update
# ./python_requirements.txt by recursively following
# and locking the dependencies seeded by
# ./python_requirements.in
#
# :python_requirements_test bazel test target which fails if
# ./python_requirements.txt does not match
# that generated from ./python_requirements.in
name = "python_requirements",
extra_args = [
"--allow-unsafe",
# ^ lets pip-compile include setuptools, recommended by
# `pip-compile -h` as future default behavior
],
requirements_in = "python_requirements.in",
requirements_txt = "python_requirements.txt",
tags = [
"manual",
# ^ exclude .update and _test targets from wildcards in,
# e.g., `bazel test ...`
],
)
29 changes: 29 additions & 0 deletions oss_scripts/requirements/python_requirements.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Specify the Python packages available as dependencies to targets in Bazel.
#
# When modifying this list, always run
# the //oss_scripts/requirements:python_requirements.update target:
#
# bazel run //oss_scripts/requirements:python_requirements.update
#
# to compile (using pip-compile) this list of direct dependencies into a pinned
# requirements file---a complete list of direct and transitive dependencies,
# pinned by version and cryptographic hash. The pinned requirements file is
# used in @rules_python's pip_parse() in the WORKSPACE file to create the
# external repositories available as dependencies to py_binary() and
# py_library() targets.
#
# To upgrade dependencies to their latest version, run the update target with
# the option --upgrade:
#
# bazel run //oss_scripts/requirements:python_requirements.update -- --upgrade
#
# Without the --upgrade option, the underlying pip-compile only adds or removes
# dependencies without upgrading them to the latest versions available in PyPI.
#
# Both this input file and the pinned requirements file should be committed to
# git. Avoid committing changes that break other developers by using an
# environment that meets the project's recommendations. Dependency resolution
# is sensitive to the Python environment (interpreter version, etc.) in which
# it is run.

tf-nightly
Loading

0 comments on commit 082c257

Please sign in to comment.