Skip to content

Commit

Permalink
Add unit test for fixed bounds check in IsWhitespace
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 657343489
  • Loading branch information
tf-text-github-robot committed Jul 29, 2024
1 parent 518f1a5 commit 748c121
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions tensorflow_text/core/kernels/whitespace_tokenizer_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,17 @@ TEST(WhitespaceTokenizerTest, InvalidCodepoint) {
EXPECT_THAT(output_end_offsets, ElementsAre(1));
}

TEST(WhitespaceTokenizerTest, MaxCodepoint) {
// Create an artificially-small config so that we can test behavior with
// codepoints at the upper edge of its range. This bitmap marks 0x00-0x3f as
// whitespace.
std::string config(8, '\xff');
// Verify that reading one bit off the end of the bitmap returns
// not-whitespace.
WhitespaceTokenizerConfig cfg(config);
EXPECT_FALSE(cfg.IsWhitespace(0x40));
}

} // namespace
} // namespace text
} // namespace tensorflow

0 comments on commit 748c121

Please sign in to comment.