Skip to content

Releases: pytorch/vision

TorchVision 0.19.1 Release

04 Sep 20:08
6194369
Compare
Choose a tag to compare

This is a patch release, which is compatible with PyTorch 2.4.1. There are no new features added.

Torchvision 0.19 release

24 Jul 19:00
48b1edf
Compare
Choose a tag to compare

Highlights

Encoding / Decoding images

Torchvision is extending its encoding/decoding capabilities. For this version, we added a GIF decoder which is available as torchvision.io.decode_gif(raw_tensor), torchvision.io.decode_image(raw_tensor), and torchvision.io.read_image(path_to_image).

We also added support for jpeg GPU encoding in torchvision.io.encode_jpeg(). This is 10X faster than the existing CPU jpeg encoder.

Read more on the docs!

Stay tuned for more improvements coming in the next versions. We plan to improve jpeg GPU decoding, and add more image decoders (webp in particular).

Resizing according to the longest edge of an image

It is now possible to resize images by setting torchvision.transforms.v2.Resize(max_size=N): this will resize the longest edge of the image exactly to max_size, making sure the image dimension don't exceed this value. Read more on the docs!

Detailed changes

Bug Fixes

[datasets] SBDataset: Only download noval file when image_set='train_noval' (#8475)
[datasets] Update the download url in class EMNIST (#8350)
[io] Fix compilation error when there is no libjpeg (#8342)
[reference scripts] Fix use of cutmix_alpha in classification training references (#8448)
[utils] Allow K=1 in draw_keypoints (#8439)

New Features

[io] Add decoder for GIF images (decode_gif(), decode_image(),read_image()) (#8406, #8419)
[transforms] Add GaussianNoise transform (#8381)

Improvements

[transforms] Allow v2 Resize to resize longer edge exactly to max_size (#8459)
[transforms] Add min_area parameter to SanitizeBoundingBox (#7735)
[transforms] Make adjust_hue() work with numpy 2.0 (#8463)
[transforms] Enable one-hot-encoded labels in MixUp and CutMix (#8427)
[transforms] Create kernel on-device for transforms.functional.gaussian_blur (#8426)
[io] Adding GPU acceleration to encode_jpeg (10X faster than CPU encoder) (#8391)
[io] read_video: accept BytesIO objects on pyav backend (#8442)
[io] Add compatibility with FFMPEG 7.0 (#8408)
[datasets] Add extra to install gdown (#8430)
[datasets] Support encoded RLE format in for COCO segmentations (#8387)
[datasets] Added binary cat vs dog classification target type to Oxford pet dataset (#8388)
[datasets] Return labels for FER2013 if possible (#8452)
[ops] Force use of torch.compile on deterministic roi_align implementation (#8436)
[utils] add float support to utils.draw_bounding_boxes() (#8328)
[feature_extraction] Add concrete_args to feature extraction tracing. (#8393)
[Docs] Various documentation improvements (#8429, #8467, #8469, #8332, #8262, #8341, #8392, #8386, #8385, #8411).
[Tests] Various testing improvements (#8454, #8418, #8480, #8455)
[Code quality] Various code quality improvements (#8404, #8402, #8345, #8335, #8481, #8334, #8384, #8451, #8470, #8413, #8414, #8416, #8412)

Contributors

We're grateful for our community, which helps us improve torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:

Adam J. Stewart ahmadsharif1, AJS Payne, Andrew Lingg, Andrey Talman, Anner, Antoine Broyelle, cdzhan, deekay42, drhead, Edward Z. Yang, Emin Orhan, Fangjun Kuang, G, haarisr, Huy Do, Jack Newsom, JavaZero, Mahdi Lamb, Mantas, Nicolas Hug, Nicolas Hug , nihui, Richard Barnes , Richard Zou, Richie Bendall, Robert-André Mauchin, Ross Wightman, Siddarth Ijju, vfdev

TorchVision 0.18.1 Release

05 Jun 19:23
126fc22
Compare
Choose a tag to compare

This is a patch release, which is compatible with PyTorch 2.3.1. There are no new features added.

TorchVision 0.18 Release

24 Apr 16:20
6043bc2
Compare
Choose a tag to compare

BC-Breaking changes

[datasets] gdown is now a required dependency for downloading datasets that are on Google Drive. This change was actually introduced in 0.17.1 (repeated here for visibility) (#8237)
[datasets] The StanfordCars dataset isn’t available for download anymore. Please follow these instructions to manually download it (#8309, #8324)
[transforms] to_grayscale and corresponding transform now always return 3 channels when num_output_channels=3 (#8229)

Bug Fixes

[datasets] Fix download URL of EMNIST dataset (#8350)
[datasets] Fix root path expansion in Kitti dataset (#8164)
[models] Fix default momentum value of BatchNorm2d in MaxViT from 0.99 to 0.01 (#8312)
[reference scripts] Fix CutMix and MixUp arguments (#8287)
[MPS, build] Link essential libraries in cmake (#8230)
[build] Fix build with ffmpeg 6.0 (#8096)

New Features

[transforms] New GrayscaleToRgb transform (#8247)
[transforms] New JPEG augmentation transform (#8316)

Improvements

[datasets, io] Added pathlib.Path support to datasets and io utilities. (#8196, #8200, #8314, #8321)
[datasets] Added allow_empty parameter to ImageFolder and related utils to support empty classes during image discovery (#8311)
[datasets] Raise proper error in CocoDetection when a slice is passed (#8227)
[io] Added support for EXIF orientation in JPEG and PNG decoders (#8303, #8279, #8342, #8302)
[io] Avoiding unnecessary copies on io.VideoReader with pyav backend (#8173)
[transforms] Allow SanitizeBoundingBoxes to sanitize more than labels (#8319)
[transforms] Add sanitize_bounding_boxes kernel/functional (#8308)
[transforms] Make perspective more numerically stable (#8249)
[transforms] Allow 2D numpy arrays as inputs for to_image (#8256)
[transforms] Speed-up rotate for 90, 180, 270 degrees (#8295)
[transforms] Enabled torch compile on affine transform (#8218)
[transforms] Avoid some graph breaks in transforms (#8171)
[utils] Add float support to draw_keypoints (#8276)
[utils] Add visibility parameter to draw_keypoints (#8225)
[utils] Add float support to draw_segmentation_masks (#8150)
[utils] Better show overlap section of masks in draw_segmentation_masks (#8213)
[Docs] Various documentation improvements (#8341, #8332, #8198, #8318, #8202, #8246, #8208, #8231, #8300, #8197)
[code quality] Various code quality improvements (#8273, #8335, #8234, #8345, #8334, #8119, #8251, #8329, #8217, #8180, #8105, #8280, #8161, #8313)

Contributors

We're grateful for our community, which helps us improve torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:

Adam Dangoor Ahmad Sharif , ahmadsharif1, Andrey Talman, Anner, anthony-cabacungan, Arun Sathiya, Brizar, Brizar , cdzhan, Danylo Baibak, Huy Do, Ivan Magazinnik, JavaZero, Johan Edstedt, Li-Huai (Allan) Lin, Mantas, Mark Harfouche, Mithra, Nicolas Hug, Nicolas Hug , nihui, Philip Meier, Philip Meier , RazaProdigy , Richard Barnes , Riza Velioglu, sam-watts, Santiago Castro, Sergii Dymchenko, Syed Raza, talcs, Thien Tran, Thien Tran , TilmannR, Tobias Fischer, vfdev, vfdev , Zhu Lin Ch'ng, Zoltán Böszörményi.

TorchVision 0.17.2 Release

28 Mar 15:37
c1d70fe
Compare
Choose a tag to compare

This is a patch release, which is compatible with PyTorch 2.2.2. There are no new features added.

TorchVision 0.17.1 Release

22 Feb 21:54
4fd856b
Compare
Choose a tag to compare

This is a patch release, which is compatible with PyTorch 2.2.1.

Bug Fixes

  • Add gdown dependency to support downloading datasets from Google Drive (#8237)
  • Fix silent correctness with convert_bounding_box_format when passing string parameters (#8258)

TorchVision 0.17 Release

30 Jan 18:31
b2383d4
Compare
Choose a tag to compare

Highlights

The V2 transforms are now stable!

The torchvision.transforms.v2 namespace was still in BETA stage until now. It is now stable! Whether you’re new to Torchvision transforms, or you’re already experienced with them, we encourage you to start with Getting started with transforms v2 in order to learn more about what can be done with the new v2 transforms.

Browse our main docs for general information and performance tips. The available transforms and functionals are listed in the API reference. Additional information and tutorials can also be found in our example gallery, e.g. Transforms v2: End-to-end object detection/segmentation example or How to write your own v2 transforms.

Towards torch.compile() support

We are progressively adding support for torch.compile() to torchvision interfaces, reducing graph breaks and allowing dynamic shape.

The torchvision ops (nms, [ps_]roi_align, [ps_]roi_pool and deform_conv_2d) are now compatible with torch.compile and dynamic shapes.

On the transforms side, the majority of low-level kernels (like resize_image() or crop_image()) should compile properly without graph breaks and with dynamic shapes. We are still addressing the remaining edge-cases, moving up towards full functional support and classes, and you should expect more progress on that front with the next release.


Detailed Changes

Breaking changes / Finalizing deprecations

  • [transforms] We changed the default of the antialias parameter from None to True, in all transforms that perform resizing. This change of default has been communicated in previous versions, and should drastically reduce the amount of bugs/surprises as it aligns the tensor backend with the PIL backend. Simply put: from now on, antialias is always applied when resizing (with bilinear or bicubic modes), whether you're using tensors or PIL images. This change only affects the tensor backend, as PIL always applies antialias anyway. (#7949)
  • [transforms] We removed the torchvision.transforms.functional_tensor.py and torchvision.transforms.functional_pil.py modules, as these had been deprecated for a while. Use the public functionals from torchvision.transforms.v2.functional instead. (#7953)
  • [video] Remove deprecated path parameter to VideoReader and made src mandatory (#8125)
  • [transforms] to_pil_image now provides the same output for equivalent numpy arrays and tensor inputs (#8097)

Bug Fixes

[datasets] Fix root path expansion in datasets.Kitti (#8165)
[transforms] allow sequence fill for v2 AA scripted (#7919)
[reference scripts] Fix quantized references (#8073)
[reference scripts] Fix IoUs reported in segmentation references (#7916)

New Features

[datasets] add Imagenette dataset (#8139)

Improvements

[transforms] The v2 transforms are now officially stable and out of BETA stage (#8111)
[ops] The ops ([ps_]roi_align, ps_[roi_pool], deform_conv_2d) are now compatible with torch.compile and dynamic shapes (#8061, #8049, #8062, #8063, #7942, #7944)
[models] Allow custom atrous_rates for deeplabv3_mobilenet_v3_large (#8019)
[transforms] allow float fill for integer images in F.pad (#7950)
[transforms] allow len 1 sequences for fill with PIL (#7928)
[transforms] allow size to be generic Sequence in Resize (#7999)
[transforms] Making root parameter optional for Vision Dataset (#8124)
[transforms] Added support for tv tensors in torch compile for func ops (#8110)
[transforms] Reduced number of graphs for compiled resize (#8108)
[misc] Various fixes for S390x support (#8149)
[Docs] Various Documentation enhancements (#8007, #8014, #7940, #7989, #7993, #8114, #8117, #8121, #7978, #8002, #7957, #7907, #8000, #7963)
[Tests] Various test enhancements (#8032, #7927, #7933, #7934, #7935, #7939, #7946, #7943, #7968, #7967, #8033, #7975, #7954, #8001, #7962, #8003, #8011, #8012, #8013, #8023, #7973, #7970, #7976, #8037, #8052, #7982, #8145, #8148, #8144, #8058, #8057, #7961, #8132, #8133, #8160)
[Code Quality] (#8077, #8070, #8004, #8113,

Contributors

We're grateful for our community, which helps us improve torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:

Aleksei Nikiforov. Alex Wei, Andrey Talman, Chunyuan WU, CptCaptain, Edward Z. Yang, Gu Wang, Haochen Yu, Huy Do, Jeff Daily, Josh Levy-Kramer, moto, Nicolas Hug, NVS Abhilash, Omkar Salpekar, Philip Meier, Sergii Dymchenko, Siddharth Singh, Thiago Crepaldi, Thomas Fritz, TilmannR, vfdev-5, Zeeshan Khan Suri.

TorchVision 0.16.2 Release

15 Dec 02:04
c6f3977
Compare
Choose a tag to compare

This is a patch release, which is compatible with PyTorch 2.1.2. There are no new features added.

TorchVision 0.16.1 Release

15 Nov 22:18
fdea156
Compare
Choose a tag to compare

This is a minor release that only contains bug-fixes

Bug Fixes

  • [models] Fix download of efficientnet weights (#8036)
  • [transforms] Fix v2 transforms in spawn multi-processing context (#8067)

TorchVision 0.16 - Transforms speedups, CutMix/MixUp, and MPS support!

04 Oct 17:32
fbb4cc5
Compare
Choose a tag to compare

Highlights

[BETA] Transforms and augmentations

sphx_glr_plot_transforms_getting_started_004

Major speedups

The new transforms in torchvision.transforms.v2 support image classification, segmentation, detection, and video tasks. They are now 10%-40% faster than before! This is mostly achieved thanks to 2X-4X improvements made to v2.Resize(), which now supports native uint8 tensors for Bilinear and Bicubic mode. Output results are also now closer to PIL's! Check out our performance recommendations to learn more.

Additionally, torchvision now ships with libjpeg-turbo instead of libjpeg, which should significantly speed-up the jpeg decoding utilities (read_image, decode_jpeg), and avoid compatibility issues with PIL.

CutMix and MixUp

Long-awaited support for the CutMix and MixUp augmentations is now here! Check our tutorial to learn how to use them.

Towards stable V2 transforms

In the previous release 0.15 we BETA-released a new set of transforms in torchvision.transforms.v2 with native support for tasks like segmentation, detection, or videos. We have now stabilized the design decisions of these transforms and made further improvements in terms of speedups, usability, new transforms support, etc.

We're keeping the torchvision.transforms.v2 and torchvision.tv_tensors namespaces as BETA until 0.17 out of precaution, but we do not expect disruptive API changes in the future.

Whether you’re new to Torchvision transforms, or you’re already experienced with them, we encourage you to start with Getting started with transforms v2 in order to learn more about what can be done with the new v2 transforms.

Browse our main docs for general information and performance tips. The available transforms and functionals are listed in the API reference. Additional information and tutorials can also be found in our example gallery, e.g. Transforms v2: End-to-end object detection/segmentation example or How to write your own v2 transforms.

[BETA] MPS support

The nms and roi-align kernels (roi_align, roi_pool, ps_roi_align, ps_roi_pool) now support MPS. Thanks to Li-Huai (Allan) Lin for this contribution!


Detailed Changes

Deprecations / Breaking changes

All changes below happened in the transforms.v2 and datapoints namespaces, which were BETA and protected with a warning. We do not expect other disruptive changes to these APIs moving forward!

[transforms.v2] to_grayscale() is not deprecated anymore (#7707)
[transforms.v2] Renaming: torchvision.datapoints.Datapoint -> torchvision.tv_tensors.TVTensor (#7904, #7894)
[transforms.v2] Renaming: BoundingBox -> BoundingBoxes (#7778)
[transforms.v2] Renaming: BoundingBoxes.spatial_size -> BoundingBoxes.canvas_size (#7734)
[transforms.v2] All public method on TVTensor classes (previously: Datapoint classes) were removed
[transforms.v2] transforms.v2.utils is now private. (#7863)
[transforms.v2] Remove wrap_like class method and add tv_tensors.wrap() function (#7832)

New Features

[transforms.v2] Add support for MixUp and CutMix (#7731, #7784)
[transforms.v2] Add PermuteChannels transform (#7624)
[transforms.v2] Add ToPureTensor transform (#7823)
[ops] Add MPS kernels for nms and roi ops (#7643)

Improvements

[io] Added support for CMYK images in decode_jpeg (#7741)
[io] Package torchvision with libjpeg-turbo instead of libjpeg (#7672, #7840)
[models] Downloaded weights are now sha256-validated (#7219)
[transforms.v2] Massive Resize speed-up by adding native uint8 support for bilinear and bicubic modes (#7557, #7668)
[transforms.v2] Enforce pickleability for v2 transforms and wrapped datasets (#7860)
[transforms.v2] Allow catch-all "others" key in fill dicts. (#7779)
[transforms.v2] Allow passthrough for Resize (#7521)
[transforms.v2] Add scale option to ToDtype. Remove ConvertDtype. (#7759, #7862)
[transforms.v2] Improve UX for Compose (#7758)
[transforms.v2] Allow users to choose whether to return TVTensor subclasses or pure Tensor (#7825)
[transforms.v2] Remove import-time warning for v2 namespaces (#7853, 7897)
[transforms.v2] Speedup hsv2rgb (#7754)
[models] Add filter parameters to list_models() (#7718)
[models] Assert RAFT input resolution is 128 x 128 or higher (#7339)
[ops] Replaced gpuAtomicAdd by fastAtomicAdd (#7596)
[utils] Add GPU support for draw_segmentation_masks (#7684)
[ops] Add deterministic, pure-Python roi_align implementation (#7587)
[tv_tensors] Make TVTensors deepcopyable (#7701)
[datasets] Only return small set of targets by default from dataset wrapper (#7488)
[references] Added support for v2 transforms and tensors / tv_tensors backends (#7732, #7511, #7869, #7665, #7629, #7743, #7724, #7742)
[doc] A lot of documentation improvements (#7503, #7843, #7845, #7836, #7830, #7826, #7484, #7795, #7480, #7772, #7847, #7695, #7655, #7906, #7889, #7883, #7881, #7867, #7755, #7870, #7849, #7854, #7858, #7621, #7857, #7864, #7487, #7859, #7877, #7536, #7886, #7679, #7793, #7514, #7789, #7688, #7576, #7600, #7580, #7567, #7459, #7516, #7851, #7730, #7565, #7777)

Bug Fixes

[datasets] Fix split=None in MovingMNIST (#7449)
[io] Fix heap buffer overflow in decode_png (#7691)
[io] Fix blurry screen in video decoder (#7552)
[models] Fix weight download URLs for some models (#7898)
[models] Fix ShuffleNet ONNX export (#7686)
[models] Fix detection models with pytorch 2.0 (#7592, #7448)
[ops] Fix segfault in DeformConv2d when mask is None (#7632)
[transforms.v2] Stricter SanitizeBoundingBoxes labels_getter heuristic (#7880)
[transforms.v2] Make sure RandomPhotometricDistort transforms all images the same (#7442)
[transforms.v2] Fix v2.Lambda’s transformed types (#7566)
[transforms.v2] Don't call round() on float images for Resize (#7669)
[transforms.v2] Let SanitizeBoundingBoxes preserve output type (#7446)
[transforms.v2] Fixed int type support for sigma in GaussianBlur (#7887)
[transforms.v2] Fixed issue with jitted AutoAugment transforms (#7839)
[transforms] Fix Resize pass-through logic (#7519)
[utils] Fix color in draw_segmentation_masks (#7520)

Others

[tests] Various test improvements / fixes (#7693, #7816, #7477, #7783, #7716, #7355, #7879, #7874, #7882, #7447, #7856, #7892, #7902, #7884, #7562, #7713, #7708, #7712, #7703, #7641, #7855, #7842, #7717, #7905, #7553, #7678, #7908, #7812, #7646, #7841, #7768, #7828, #7820, #7550, #7546, #7833, #7583, #7810, #7625, #7651)
[CI] Various CI improvements (#7485, #7417, #7526, #7834, #7622, #7611, #7872, #7628, #7499, #7616, #7475, #7639, #7498, #7467, #7466, #7441, #7524, #7648, #7640, #7551, #7479, #7634, #7645, #7578, #7572, #7571, #7591, #7470, #7574, #7569, #7435, #7635, #7590, #7589, #7582, #7656, #7900, #7815, #7555, #7694, #7558, #7533, #7547, #7505, #7502, #7540, #7573)
[Code Quality] Various code quality improvements (#7559, #7673, #7677, #7771, #7770, #7710, #7709, #7687, #7454, #7464, #7527, #7462, #7662, #7593, #7797, #7805, #7786, #7831, #7829, #7846, #7806, #7814, #7606, #7613, #7608, #7597, #7792, #7781, #7685, #7702, #7500, #7804, #7747, #7835, #7726, #7796)

Contributors

We're grateful for our community, which helps us improve torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:
Adam J. Stewart, Aditya Oke , Andrey Talman, Camilo De La Torre, Christoph Reich, Danylo Baibak, David Chiu, David Garcia, Dennis M. Pöpperl, Dhuige, Duc Mguyen, Edward Z. Yang, Eric Sauser , Fansure Grin, Huy Do, Illia Vysochyn, Johannes, Kai Wana, Kobrin Eli, kurtamohler, Li-Huai (Allan) Lin, Liron Ilouz, Masahiro Hiramori, Mateusz Guzek, Max Chuprov, Minh-Long Luu (刘明龙), Minliang Lin, mpearce25, Nicolas Granger, Nicolas Hug , Nikita Shulga, Omkar Salpekar, Paul Mulders, Philip Meier , ptrblck, puhuk, Radek Bartoň, Richard Barnes , Riza Velioglu, Sahil Goyal, Shu, Sim Sun, SvenDS9, Tommaso Bianconcini, Vadim Zubov, vfdev-5