Skip to content

v0.2.0

Compare
Choose a tag to compare
@YLGH YLGH released this 28 Jun 06:37

Changelog

PyPi Installation

The recommended install location is now from pypy. Additionally, TorchRec's binary will not longer contain fbgemm_gpu. Instead fbgemm_gpu will be installed as a dependency. See README for details

Planner Improvements

We added some additional features and bug fixed some bugs
Variable batch size per feature to support request only features
Better calculations for quant UVM Caching
Bug fix for shard storage fitting on device

Single process Batched + Fused Embeddings

Previously TorchRec’s abstractions (EmbeddingBagCollection/EmbeddingCollection) over FBGEMM kernels, which provide benefits such as table batching, optimizer fusion, and UVM placement, could only be used in conjunction with DistributedModelParallel. We’ve decoupled these notions from sharding, and introduced the FusedEmbeddingBagCollection, which can be used as a standalone module, with all of the above features, and can also be sharded.

Sharder

We enabled embedding sharding support for variable batch sizes across GPUs.

Benchmarking and Examples

We introduce
A set of benchmarking tests, showing performance characteristics of TorchRec’s base modules and research models built out of TorchRec.
We provide an example demonstrating training a distributed TwoTower (i.e. User-Item) Retrieval model that is sharded using TorchRec. The projected item embeddings are added to an IVFPQ FAISS index for candidate generation. The retrieval model and KNN lookup are bundled in a Pytorch model for efficient end-to-end retrieval.
inference example with Torch Deploy for both single and multi GPU

Integrations

We demonstrate that TorchRec works out of the box with many components commonly used alongside PyTorch models in production like systems, such as

  • Training a TorchRec model on Ray Clusters utilizing the Torchx Ray scheduler
  • Preprocessing and DataLoading with NVTabular on DLRM
  • Training a TorchRec model with on-the-fly preprocessing with TorchArrow showcasing RecSys domain UDFs.

Scriptable Unsharded Modules

The unsharded embedding modules (EmbeddingBagCollection/EmbeddingCollection and variants) are now torch scriptable.

EmbeddingCollection Column Wise Sharding

We now support column wise sharding for EmbeddingCollection, enabling sequence embeddings to be sharded column wise.

JaggedTensor

Boost performance of to_padded_dense function by implementing with FBGEMM.

Linting

Add lintrunner to allow contributors to lint and format their changes quickly, matching our internal formatter.