- [metal] (experimental) introduce partial support for Apple Metal
- [core] Potential internal API breaking changes (operator names, comparison ops refactored)
- [data] (experimental) Smarter TDim simplification, handling of Min and Max. TDim assertions for simplifications.
- [data] (experimental) WIP around multiple scenarios (modes) for LLM inference
- Extra examples
- [linalg] (experimental) kernels targetting LLM block-quantized tasks (inc. intel 32x1 q40f32)
- [data] Rework tdim and symbols, introduce inequalities assertions, min and max operators
- [data] Generalize Blob usage in Tensor
- [linalg] Rework reduce implementation, introduce more generic binary ops support (wip)
- [linalg] Introduce multithreaded matrix multiplication runner
- [linalg] Introduce Q4_0 block quantization for weights (wip)
- [linalg] Introduce AMX f16 kernels, Neon Q40F16 kernel (experimental)
- [linalg] wasm f32 4x4 kernel
- [core] Introduce Opaque and OpaqueFact to escape Tensor and TValue formalism
- [core] generalize/improve float precision translator, with translation filter
- [core] Introduce garbage collecting in patch application, new compact algo, and rework constant propagation to spare memory
- [core] Rework packed format and packing metadata
- [linalg/core] Introduce multiple packing format for matmul kernels
- [core] Work In Progress refactoring binary, towards more optimized execution strategies
- [nnef] inequalities assertions extension, q4_0 extension
- [tflite] plug in tanh and sigmoid
- [TFLite] fixes for fully connected and max pool layers
- Allow opting out of new memory friendly execution order optimisation
- More memory/cache friendly execution order
- Several fixes around symbolic dimensions computation (some should help with attention models)
- [AMX] Put AMX for iOS behind a feature gate ("tract-linalg/apple-amx-ios").
- [ONNX] Support for external storage of tensors with offset and length
- [ONNX] Lots of fixes around binary quantized operators (add, mul, etc)
- [PY] Fix python source distribution
- [AMX] Activate AMX on iOS
- [API] Introduce transforms in external api
- [BLAS] Introduce a simple BLAS transform for Matrix multiplication
- [F16] Introduce a Reduce that solves many L2 normalization errors in f16
This version has been yanked to revert systematic activation of AMX on iOS. AMX is a private API and Apple may reject an App that performs AMX instructions.
- [ONNX] Support for external storage of tensors with offset and length
- MSRV is now 1.75.0
- [internal] ConvUnary and MatmulUnary are replaced by binary, potentially dynamic equivalent
- [ONNX] LayerNormalization support
- [ONNX] ignoring output shapes is now the default
- [intel] fix in AVX512F matrix vector product
- [tflite] alpha, embryonic support. some convolutional models working.
- [kaldi] remove abandonned kaldi experimental support
- [refactoring] Runtime abstraction and runtime-targetting tests
- [refactoring] Refactoring Python and C API around a possible tract-api. Introducing dylib support.
- [pytorch compat] fixes around node names starting by / (bug triggered by recent pytorch versions)
0.20.7 to 0.20.17 are misfires
- Bug fixes, fix display of If operator
- Various bugfix around Einsum
- Einsum now has functions to translate to MatMul and other axes manipulations
- [optim] 32x32 f32 AMX kernel (for Apple Silicon M family)
- [optim] bunch of AMX512F kernels (square, skinny, vector)
- [ONNX] introduce Trilu, TopK
- [NNEF/OPL] submodel loader
- [ONNX] support alternative layout for LSTM (layout=1, batch becomes first axis)
- [ONNX] If operators with dynamic condition (very basic optimisations, no nnef support yet).
- HardSwiwh ONNX, tract_core_hard_swish in NNEF/OPL
- introducing tract_core_submodel in NNEF/OPL
- JSON resource loader in NNEF/OPL
- Profiling API tweaks
--folded
view for model command line dump (hide Scan loops)
- Various bug fixes
- more bug fixes
- wip on python doc auto-deploy
- 0.19.3 and 0.19.4 are release misfires
- lots of bugfixes following 0.19 big changes
- introducing the JSON NNEF resource
- [NNEF/OPL] introduce json resource loader
- extend Complex number support (under a feature flag)
- [nnef] new identifier syntax is now opt-in for serialization (both accepted at loading)
- alpha-level C interface. how and how to deploy it (where to put the .h, whether or not to build and ship dylibs)
- alpha-level python interface. deployed on pypi as "tract". At this stage, API is undocumented and may still change significantly.
- [BREAKING] TValue are now used in run() instead of the previous mix of Tensor and Arc
- internal API breaking changes: no more op_families, libcli split away. State is no longer Send (but can be "frozen" to a Send counterpart).
- Symbols can now be String instead of char. They are not shared globally anymore, but scoped in the Model instead.
- [pulse] S symbol is no longer magic. The time dimension symbol must be provided at pulsification time.
- [pulse] In most cases, we can now pulsify without an explicit pulse len (pulse len can be expression).
- [cli] deprecated "x" syntax for shape is removed
- [nnef/opl] new i"..." syntax for escaping identifiers: i"some arbitrary string". Allow serialization of any ONNX model with any kind of string as node names.
- [ONNX] Signal processing operators (DTF, STFT, MelWeightMatrix, BlackmanWindow, HammingWindow, HannWindow)
- [ONNX] bitwise operations
- [ONNX] Compatibility target raised to operator set 18
- [NNEF] Introduce a "resource" extension for loading values from a separate source (as a config file)
- Workaround for cpu detection failure on FreeBSD / arm64
- Various bug fixes
- [pulse] improve convolution (and others) pulsification to avoid some unecessary buffering delay
- [cli] support multiple streaming inputs and outputs
- [ONNX] more relaxed Clip operator rules
- prepare NNEF for further tract-opl extension (resource support)
- more generic matmul
- optimise some EinSum cases as matmul
- [ONNX Breaking] Several changes to move towards supporting ONNX symbolic dimensions (actual fixes, but they may break stuff that was working more or less by accident). It may be required to erase output shapes explicitely when input shape is overriden on models that were working before.
- [CLI breaking] ONXN symbolic dimensions has some impact here too. --input-bundle is deprecated, is was overriden and ambiguous. Instead, there is a --input-facts-from-bundle global option, and a --input-from-bundle option in the subcommands run, profile, dump. --allow-random-input is also moved to subcommands. We think all previously supported behaviours are still there. Please open issues if not.
- clippy up all tract code
- various fixes
- 0.17.5 and 0.17.6 are misfired
- [cli] global --set (as a somehat cleaner --concretize successor) allow to set a symbol value after decluttering
- [cli] run --save-outputs output.npz to save execution outputs
- dozens of fixs and code cleanup (clippy-fication in progress)
- [License] Allowing https://spdx.org/licenses/Unicode-DFS-2016.html (no tldr yet, but pretty similar to BSD-2)
- [Breaking] CLI --json option reports costs as strings instead of numbers (in order to allow symbolic values).
- Sigmoid/Tanh f32 reimpl, plus new f16 impl.
- Sanitiser=address in the CI. Fixed a couple of overflow/memleaks. (Nothing looked too awful.)
- ONNX NonMaxSuppression
- [Breaking] [ONNX-ML] TreeEnsembleClassifier with binary output (single class) now mimicks scikit-learn output layout.
- bump ONNX protobuf file and support external tensors format
- new "skinny" kernels for avx2/fma f32 multiplication (positive impact on low, non 1 batch size for DNN-heavy loads)
- Softmax is now an operator in core, coming with a direct quantized implementation
- new TypedFact constructor API ( f32::fact(&[1, 4, 12]), f32::fact(shape!(Symbol::from('N'), 12)))
- fixes and optimisation of re-quantization pipeline
- fixes around symbols in NNEF/OPL
- Various changes around quantization support (qi32 appearance)
- Intel optimisation are back
- Range is now more flexible, should unlock some BERT models with symbolic dimensions.
- some optimisations in depthwise convolutions
- various bugfixes
- [Breaking] Fixed nnef "tile" operator definition ("repeats" is plural). As a consequence models using "tile" serialized with tract with prior versions can not be loaded anymore (and vice-versa).
- [Breaking] tract-opl models Scan syntax changed a bit. Models exported by <0.16.2 are loadable in >=0.16.2, but not the other way around.
- Optimisation in deconv
- [Breaking] Minimum Rust Supported Version is now 1.59.0
- [Breaking] Small API changes in model api: .compact(), .optimize(), .declutter() now take &mut self and work in place.
- [LICENSE] Only the licensing for dependencies of the top-level library crates (tensorflow, onnx, kaldi, pulse) will now be monitored. The command line tool (tract crate in cli folder) is for developpers (tract developpers or tract integrators), is not meant to be shipped to end-user, and it concentrates most of the license and dependency complexity.
- [LICENSE] BSD-3-Clause is now accepted in tract.
- Optimisations around convolutions and deconvolution
- Optimisation on Cortex-A53, first round of Cortex-A55 optimisation too.
- Fix brand new ArrayFeatureExtractor inference
- ONNX ArrayFeatureExtractor
- ConvTranspose/deconv optimisation
- just a release script failure
- hold half at 1.7.x for compat with rust 1.50
- ConvTranspose/deconv pulse support
- ONNX SpaceToDepth/DepthToSpace
- optimise i8u8, u8i8 and u8*u8 matrix products (and convo)
- bump prost dep
- some optimisations for arm32 (cortex-a7 and a9)
- Switched the order of item_type and item_type_vendor in the NNEF tendor format to be consistent with NNEF-tools, and changed the item_type of integers due to an error in the specification. Breaking for tensor files containing integers or strings.
- Scan output batching optimisation
- Concat pulsification over a secondary axis
- new aarch64 16x4 f32 kernel
- better handling of errors in ONNX parser
- fix/workaround some performance regressions bubling from recent ndarray changes
- ONNX ConvTranspose, Gather, GatherND, GatherElements, Scatter, ScatterND, ScatterElements support (and NNEF deconv)
- Fixes around integer serialisation in NNEF
- workaround subtle breaking changes in ndarray (between 0.15.1 and 0.15.2)
- low-level functions in linalg are now version tagged: two versions of tract can now co-exist in the same binary
- rustc minimal version is now 1.50
- dependencies version bumps (ndarray, itertools, and others)
- fix sigmoid and tanh variability on intel
- temporary disable binary unicast add fusing (too many bugs)
- Release are now "in sync": all tract crate versions on a build must be aligned
- optimisations, with a focus on aarch64
- Dependency bumps
- 0.12.3 is a misfire
- hotfixes on 0.12.2 new tree classifier
- fix X compilation from macos/aarch64 to macos/intel
- ONNX-ML: CategoryMapper and TreeEnsembleClassifier (partial, SoftmaxZero and Probits are missing). With NNEF support.
- cargo-deny enforces licences choices
-
0.12.0 is a misfire.
-
API BREAKING: TypedFact::dt_shape & friends can not fail anymore, no longer return a result (remove
?
) -
Breaking: Rust minimal version bumped to 1.42
-
Early, basic, correct but slow support for i8 by u8 matrix mult.
-
Support for Apple Silicon, aka M1, aka aarch64 darwin (but not in CI yet)
-
dynamic quantization convolution support
-
release now ships cli musl builds for linux
-
optimizations targetting small Cortex-A (like 7, 8, and 9)
-
command line dump --profile --cost now computes flops
-
ONNX: OneHot op support
- ONNX: new op: DynamicQuantizeLinear
- tract-data crate split from core, containing tensor, dim, and datum types.
- switch from error_chain to anyhow
- simplify trivial gathers to a slice
- generalize symbolic dimension a bit: support "2S" and the like
- deprecate "x" syntax in CLI, please use
,
instead
- NNEF: tract-nnef no longer performs gunziping, but expect an uncompressed tar stream. We found out is it counter-productive (weights matrices are more or less random, they do not compress easily, and decompression is expensive). NNEF networks in the wild are .tgz file. Using flate2, decompression is a one-liner, but it must be done by the client code now.
- bumped extended nnef compat version (unchecked at this stage) to "alpha1"
- move pulse operators and translation to their own crate and nnef registry
- generalize TDim to support an arbitrary number of symbols
- concretize_stream_dim is superseded by concrentize_dims
- new crates, building on tract-opl introduction:
- tract-pulse-opl: pulse runtime (handful of ops, including Delay) is now separated from core
- tract-onnx-opl: onnx runtime (4 ops not belonging in core)
- tract-pulse: pulsification of models (model-translation time)
- tract-onnx is now limited to onnx model loading and conversion
- load a NNEF as a TypedModel using tract_nnef, and from the CLI
- dump a tract TypedModel to NNEF (with extensions for op not nnef compatbile)
- not a full coverage of nnef, but enough for most CNN (image categorizers zoo working)
- 80% of onnx tests are surviving a NNEF dump and reload at this stage
- covered operators compatible with Operator Sets 9, 10, 11 (new) and 12 (new)
- Tensor::l1 method is gone
- Support for -gnu targets (non-mvsc).
- --cost now gives the number of parameters in the model
- SimpleState is clonable again (actually useful !)
- introduce
TypedModel::method.concretize_stream_dim
- various pulsification bugfixes
- fix Reshape with TDim
Still no shortage of version numbers...
- NormalizedModel (and friends) are gone. They were only useful as a pre-pulse transformation pre-requisite that the current TypedModel (& co) meets.
- TypedModel::into_optimized() is gone. InferenceModel::into_optimized() stays as an end-to-end shortcut for simple cases. It does .into_typed()?.declutter()?.optimize()).
- TypedModel::codegen() is now ::optimize()
I wish I had seen these issues yesterday. Anyway, version numbers are cheap.
- Bumping minimum rust to 1.41
- CLI refactoring (hopefully stabilizing a bit?)
profile --bench
is now bench- profile is now
dump --profile
- cost is now
dump --cost
- profiling is now done during a full net instead of per op
- new "compact" graph dumper, profile visual hints
dump --cost --profile --json
output profiling and cost information- show logical names for ops instead of the Op struct names (not 100% sure it's right)
- criterion integration
- WASM support for tract-onnx and tract-tensorflow targets (CI)
- Convenience methods added to Models to allow model building in fluent style, up to Plan instantiation (SimplePlan now nicknamed RunnableModel). Non breaking.
- Support for ONNX bidi LSTM (CI), GRU and RNN (untested, consider alpha)
- Fixes around nets with a non trivial batch size (axis simplification code, matmul op fusion)
- Lock ndarray version to dodge rustc/llvm issue (rust-lang/rust#71506)
- Use http://gihub.com/kali/readings for instrumentation.
- New jupyter/keras/tf example
- ARMv8 tanh / sigmoid optimisation
- refactor exports and dependencies
- preferred way to use tract is now to
use tract_tensorflow::prelude::*;
- singleton framework is built by
let tensorflow = tensorflow()
. The Framework trait is in the prelude too. - the prelude contains a reexport of
tract_core
, and of ndarray astract_ndarray
- no more need to declare dependency on
tract-core
and/ortract-linalg
in Cargo.toml - same goes for
tract_onnx
- preferred way to use tract is now to
- Rustc minimum version is now 1.39
- Support for MatMulInteger, ConvInteger
- Support for QuantizeLinear DequantizeLinear
- Basic support for QLinearMatMul, QLinearConv
- Initial support for GatherV2
- Fix PReLu normalization
- Initial support for AddV2, Mean, Min, Prod, Sum
- Make Onnx loader operator set aware, and Slice-10 support.
- Cost now reports Delay ops buffer size
- Bump dependencies (protobuf) and fix codegen
- Windows CI now performs a top-level "cargo check"
- remove the no_panic checks, as too fragile (breaking non-lto builds)
- Change tensor facts names for consistency: TensorFact is now InferenceFact.
- Introduce Windows support, including CI coverage for linalg
- Switch from Travis to GitHub Actions
- Internal refactoring around tract-core canonic opset
- Tract CLI can now compute a FLOP number for networks ("cost" subcommand). Furthermore the CI asserts its value for a few networks to prevent optimisation regressions.
- Fix: handling of -1 in ONNX Reshape op
- Fix release script after 0.4.1 release disaster.
- Fix for OS where CARGO_CFG_TARGET_FAMILY is undefined
- Linear Algebra package refactor
- tract-core canonic operator set introduction
- significant performance boost (up to 20% on some real-life networks)
- Start Kaldi networks support (LSTM, Renorm, Affine, downsample)
This Changelog started way too late. But better late than never.