Skip to content

Releases: pytorch/vision

Update dependency on wheels to match version in PyPI

21 Oct 16:58
fa347eb
Compare
Choose a tag to compare

Users were reporting issues installing torchvision on PyPI, this release contains an update to the dependencies for wheels to point directly to torch==0.10.0

RegNet, EfficientNet, FX Feature Extraction and more

21 Oct 15:46
58a60b2
Compare
Choose a tag to compare

This release introduces the RegNet and EfficientNet architectures, a new FX-based utility to perform Feature Extraction, new data augmentation techniques such as RandAugment and TrivialAugment, updated training recipes that support EMA, Label Smoothing, Learning-Rate Warmup, Mixup and Cutmix, and many more.

Highlights

New Models

RegNet and EfficientNet are two popular architectures that can be scaled to different computational budgets. In this release we include 22 pre-trained weights for their classification variants. The models were trained on ImageNet and can be used as follows:

import torch
from torchvision import models

x = torch.rand(1, 3, 224, 224)

regnet = models.regnet_y_400mf(pretrained=True)
regnet.eval()
predictions = regnet(x)

efficientnet = models.efficientnet_b0(pretrained=True)
efficientnet.eval()
predictions = efficientnet(x)

The accuracies of the pre-trained models obtained on ImageNet val are seen below (see #4403, #4530 and #4293 for more details)

Model Acc@1 Acc@5
regnet_x_400mf 72.834 90.95
regnet_x_800mf 75.212 92.348
regnet_x_1_6gf 77.04 93.44
regnet_x_3_2gf 78.364 93.992
regnet_x_8gf 79.344 94.686
regnet_x_16gf 80.058 94.944
regnet_x_32gf 80.622 95.248
regnet_y_400mf 74.046 91.716
regnet_y_800mf 76.42 93.136
regnet_y_1_6gf 77.95 93.966
regnet_y_3_2gf 78.948 94.576
regnet_y_8gf 80.032 95.048
regnet_y_16gf 80.424 95.24
regnet_y_32gf 80.878 95.34
EfficientNet-B0 77.692 93.532
EfficientNet-B1 78.642 94.186
EfficientNet-B2 80.608 95.31
EfficientNet-B3 82.008 96.054
EfficientNet-B4 83.384 96.594
EfficientNet-B5 83.444 96.628
EfficientNet-B6 84.008 96.916
EfficientNet-B7 84.122 96.908

We would like to thank Ross Wightman and Luke Melas-Kyriazi for contributing the weights of the EfficientNet variants.

FX-based Feature Extraction

A new Feature Extraction method has been added to our utilities. It uses PyTorch FX and enables us to retrieve the outputs of intermediate layers of a network which is useful for feature extraction and visualization. Here is an example of how to use the new utility:

import torch
from torchvision.models import resnet50
from torchvision.models.feature_extraction import create_feature_extractor


x = torch.rand(1, 3, 224, 224)

model = resnet50()

return_nodes = {
    "layer4.2.relu_2": "layer4"
}
model2 = create_feature_extractor(model, return_nodes=return_nodes)
intermediate_outputs = model2(x)

print(intermediate_outputs['layer4'].shape)

We would like to thank Alexander Soare for developing this utility.

New Data Augmentations

Two new Automatic Augmentation techniques were added: Rand Augment and Trivial Augment. Both methods can be used as drop-in replacement of the AutoAugment technique as seen below:

from torchvision import transforms

t = transforms.RandAugment()
# t = transforms.TrivialAugmentWide()
transformed = t(image)

transform = transforms.Compose([
    transforms.Resize(256),
    transforms.RandAugment(),  # transforms.TrivialAugmentWide()
    transforms.ToTensor()])

We would like to thank Samuel G. Müller for contributing Trivial Augment and for his help on refactoring the AA package.

Updated Training Recipes

We have updated our training reference scripts to add support of Exponential Moving Average, Label Smoothing, Learning-Rate Warmup, Mixup, Cutmix and other SOTA primitives. The above enabled us to improve the classification Acc@1 of some pre-trained models by over 4 points. A major update of the existing pre-trained weights is expected on the next release.

Backward-incompatible changes

[models] Use torch instead of scipy for random initialization of inception and googlenet weights (#4256)

Deprecations

[models] Deprecate the C++ vision::models namespace (#4375)

New Features

[datasets] Add iNaturalist dataset (#4123)
[datasets] Download and Kinetics 400/600/700 Datasets (#3680)
[datasets] Added LFW Dataset (#4255)
[models] Add FX feature extraction as an alternative to intermediate_layer_getter (#4302) (#4418)
[models] Add RegNet Architecture in TorchVision (#4403) (#4530) (#4550)
[ops] Add new masks_to_boxes op (#4290) (#4469)
[ops] Add StochasticDepth implementation (#4301)
[reference scripts] Adding Mixup and Cutmix (#4379)
[transforms] Integration of TrivialAugment with the current AutoAugment Code (#4221)
[transforms] Adding RandAugment implementation (#4348)
[models] Add EfficientNet Architecture in TorchVision (#4293)

Improvements

Various documentation improvements (#4239) (#4251) (#4275) (#4342) (#3894) (#4159) (#4133) (#4138) (#4089) (#3944) (#4349) (#3754) (#4308) (#4352) (#4318) (#4244) (#4362) (#3863) (#4382) (#4484) (#4503) (#4376) (#4457) (#4505) (#4363) (#4361) (#4337) (#4546) (#4553) (#4565) (#4567) (#4574) (#4575) (#4383) (#4390) (#3409) (#4451) (#4340) (#3967) (#4072) (#4028) (#4132)
[build] Add CUDA-11.3 builds to torchvision (#4248)
[ci, tests] Skip some CPU-only tests on CircleCI machines with GPU (#4002) (#4025) (#4062)
[ci] New issue templates (#4299)
[ci] Various CI improvements, in particular putting back GPU testing on windows (#4421) (#4014) (#4053) (#4482) (#4475) (#3998) (#4388) (#4179) (#4394) (#4162) (#4065) (#3928) (#4081) (#4203) (#4011) (#4055) (#4074) (#4419) (#4067) (#4201) (#4200) (#4202) (#4496) (#3925)
[ci] ping maintainers in case a PR was not properly labeled (#3993) (#4012) (#4021) (#4501)
[datasets] Add bzip2 file compression support to datasets (#4097)
[datasets] Faster dataset indexing (#3939)
[datasets] Enable logging of internal dataset instanciations. (#4319) (#4090)
[datasets] Removed copy=False in torch.from_numpy in MNIST to avoid warning (#4184)
[io] Add warning for files with corrupt containers (#3961)
[models, tests] Add test to check that classification models are FX-compatible (#3662)
[tests] Speedup various tests (#3929) (#3933) (#3936)
[models] Allow custom activation in SqueezeExcitation of EfficientNet (#4448)
[models] Allow gradient backpropagation through GeneralizedRCNNTransform to inputs (#4327)
[ops, tests] Add JIT tests (#4472)
[ops] Make StochasticDepth FX-compatible (#4373)
[ops] Added backward pass on CPU and CUDA for interpolation with anti-alias option (#4208) (#4211)
[ops] Small refactoring to support opt mode for torchvision ops (fb internal specific) (#4080) (#4095)
[reference scripts] Added Exponential Moving Average support to classification reference script (#4381) (#4406) (#4407)
[reference scripts] Adding label smoothing on classification reference (#4335)
[reference scripts] Further enhance Classification Reference (#4444)
[reference scripts] Replaced to_tensor() with pil_to_tensor() + convert_image_dtype() (#4452)
[reference scripts] Update the metrics output on reference scripts (#4408)
[reference scripts] Warmup schedulers in References (#4411)
[tests] Add check for fx compatibility on segmentation and video models (#4131)
[tests] Mock redirection logic for tests (#4197)
[tests] Replace set_deterministic with non-deprecated spelling (#4212)
[tests] Skip building torchvision with ffmpeg when python==3.9 (#4417)
[tests] [jit] Make operation call accept Stack& instead Stack* (#63414) (#4380)
[tests] make tests that involve GDrive more robust (#4454)
[tests] remove dependency for dtype getters (#4291)
[transforms] Replaced example usage of ToTensor() by PILToTensor() + ConvertImageDtype() (#4494)
[transforms] Explicitly copying array in pil_to_tensor (#4566) (#4573)
[transforms] Make get_image_size and get_image_num_channels public. (#4321)
[transforms] adding gray images support for adjust_contrast and adjust_saturation (#4477) (#4480)
[utils] Support single color in utils.draw_bounding_boxes (#4075)
[video, documentation] Port the video_api.ipynb notebook to the example gallery (#4241)
[video, io, tests] Added check for invalid input file (#3932)
[video, io] remove deprecated function call (#3861) (#3989)
[video, tests] Removed test_audio_video_sync as it doesn't work as expected (#4050)
[video] Build torchvision with ffmpeg only on Linux and ignore ffmpeg on other platforms (#4413, #4410, #4041)

Bug Fixes

[build] Conda: Add numpy dependency (#4442)
[build] Explicitly exclude PIL 8.3.0 from compatible dependencies (#4148)
[build] More robust version check (#4285)
[ci] Fix broken clang format test. (#4320)
[ci] Remove mentions of conda-forge (#4082)
[ci] fixup '' -> '/./' for CI filter (#4059)
[datasets] Fix download from google drive which was downloading empty files in some cases (#4109)
[datasets] Fix splitting CelebA dataset (#4377)
[datasets] Add support for files with periods in name (#4099)
[io, tests] Don't check transparency channel for pil >= 8.3 in test_decode_png (#4167)
[io] Fix size_t issues across JPEG versions and platforms (#4439)
[io] Raise proper error when decoding 16-bits jpegs (#4101)
[io] Unpinned the libjpeg version and fixed jpeg_mem_dest's size type Wind… (#4288)
[io] deinterlacing PNG images with read_image (#4268)
[io] More robust ffmpeg version query in setup.py (#4254)
[io] Fixed read_image bug (#3948)
[models] Don't download backbone weights if pretrained=True (#4283)
[onnx, tests] Do not disable profiling executor in ...

Read more

Minor bugfix release

27 Sep 04:40
ca1a620
Compare
Choose a tag to compare

This release depends on pytorch 1.9.1
No functional changes other than minor updates to CI rules.

iOS support, GPU image decoding, SSDlite and more

15 Jun 14:55
300a8a4
Compare
Choose a tag to compare

This release improves support for mobile, with new mobile-friendly detection models based on SSD and SSDlite, CPU kernels for quantized NMS and quantized RoIAlign, pre-compiled binaries for iOS available in cocoapods and an iOS demo app. It also improves image IO by providing JPEG decoding on the GPU, and many more.

Highlights

[BETA] New models for detection

SSD and SSDlite are two popular object detection architectures which are efficient in terms of speed and provide good results for low resolution pictures. In this release, we provide implementations for the original SSD model with VGG16 backbone and for its mobile-friendly variant SSDlite with MobileNetV3-Large backbone. The models were pre-trained on COCO train2017 and can be used as follows:

import torch
import torchvision

# Original SSD variant
x = [torch.rand(3, 300, 300), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.ssd300_vgg16(pretrained=True)
m_detector.eval()
predictions = m_detector(x)

# Mobile-friendly SSDlite variant
x = [torch.rand(3, 320, 320), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.ssdlite320_mobilenet_v3_large(pretrained=True)
m_detector.eval()
predictions = m_detector(x)

The following accuracies can be obtained on COCO val2017 (full results available in #3403 and #3757):

Model mAP mAP@50 mAP@75
SSD300 VGG16 25.1 41.5 26.2
SSDlite320 MobileNetV3-Large 21.3 34.3 22.1

[STABLE] Quantized kernels for object detection

The forward pass of the nms and roi_align operators now support tensors with a quantized dtype, which can help lowering the memory footprint of object detection models, particularly on mobile environments.

[BETA] JPEG decoding on the GPU

Decoding jpegs is now possible on GPUs with the use of nvjpeg, which should be readily available in your CUDA setup. The decoding time of a single image should be about 2 to 3 times faster than with libjpeg on CPU. While the resulting tensor will be stored on the GPU device, the input raw tensor still needs to reside on the host (CPU), because the first stages of the decoding process take place on the host:

from torchvision.io.image import read_file, decode_jpeg

data = read_file('path_to_image.jpg')  # raw data is on CPU
img = decode_jpeg(data, device='cuda')  # decoded image in on GPU

[BETA] iOS support

TorchVision 0.10 now provides pre-compiled iOS binaries for its C++ operators, which means you can run Faster R-CNN and Mask R-CNN on iOS. An example app on how to build a program leveraging those ops can be found in here.

[STABLE] Speed optimizations for Tensor transforms

The resize and flip transforms have been optimized and its runtime improved by up to 5x on the CPU. The corresponding PRs were sent to PyTorch in pytorch/pytorch#51653, pytorch/pytorch#54500 and pytorch/pytorch#56713

[STABLE] Documentation improvements

Significant improvements were made to the documentation. In particular, a new gallery of examples is available: see here for the latest version (the stable version is not released at the time of writing). These examples visually illustrate how each transform acts on an image, and also properly documents and illustrate the output of the segmentation models.

The example gallery will be extended in the future to provide more comprehensive examples and serve as a reference for common torchvision tasks.

Backwards Incompatible Changes

  • [transforms] Ensure input type of normalize is float. (#3621)
  • [models] Use PyTorch smooth_l1_loss and remove private custom implementation (#3539)

New Features

  • Added iOS binaries and test app (#3582)(#3629) (#3806)
  • [datasets] Added KITTI dataset (#3640)
  • [utils] Added utility to draw segmentation masks (#3330, #3824)
  • [models] Added the SSD & SSDlite object detection models (#3403, #3757, #3766, #3855, #3896, #3818, #3799)
  • [transforms] Added antialias option to transforms.functional.resize (#3761, #3810, #3842)
  • [transforms] Add new max_size parameter to Resize (#3494)
  • [io] Support for decoding jpegs on GPU with nvjpeg (#3792)
  • [ci, rocm] Add ROCm to builds (#3840) (#3604) (#3575)
  • [ops, models.quantization] Add quantized version of NMS (#3601)
  • [ops, models.quantization] Add quantized version of RoIAlign (#3624, #3904)

Improvement

Code quality

  • Remove inconsistent FB copyright headers (#3741)
  • Keep consistency in classes ConvBNActivation (#3750)
  • Removed unused imports (#3738, #3740, #3639)
  • Fixed floor_divide deprecation warnings seen in pytest output (#3672)
  • Unify onnx and JIT resize implementations (#3654)
  • Cleaned-up imports in test files related to datasets (#3720)
  • [documentation] Remove old css file (#3839)
  • [ci] Fix inconsistent version pinning across yaml files (#3790)
  • [datasets] Remove redundant path.join in Places365 (#3545)
  • [datasets] Remove imprecise error handling in PhotoTour dataset (#3488)
  • [datasets, tests] Remove obsolete test_datasets_transforms.py (#3867)
  • [models] Making protected params of MobileNetV3 public (#3828)
  • [models] Make target argument in transform.py truly optional (#3866)
  • [models] Adding some references on MobileNetV3 implementation. (#3850)
  • [models] Refactored set_cell_anchors() in AnchorGenerator (#3755)
  • [ops] Minor cleanup of roi_align_forward_kernel_impl (#3619)
  • [ops] Replace deprecated AutoNonVariableTypeMode with AutoDispatchBelowADInplaceOrView. (#3786, #3897)
  • [tests] Port tests to use pytest (#3852, #3845, #3697, #3907, #3749)
  • [ops, tests] simplify get_script_fn (#3541)
  • [tests] Use torch.testing.assert_close in out test suite (#3886) (#3885) (#3883) (#3882) (#3881) (#3887) (#3880) (#3878) (#3877) (#3875) (#3888) (#3874) (#3884) (#3876) (#3879) (#3873)
  • [tests] Clean up test accept behaviour (#3759)
  • [tests] Remove unused masks variable in test_image.py (#3910)
  • [transforms] use ternary if in resize (#3533)
  • [transforms] replaced deprecated call to ByteTensor with from_numpy (#3813)
  • [transforms] Remove unnecessary casting in adjust_gamma (#3472)

Bugfixes

  • [ci] set empty cxx flags as default (#3474)
  • [android][test_app] Cleanup duplicate dependency (#3428)
  • Remove leftover exception (#3717)
  • Corrected spelling in a TypeError (#3659)
  • Add missing device info. (#3651)
  • Moving tensors to the right device (#3870)
  • Proper error message (#3725)
  • [ci, io] Pin JPEG version to resolve the size_t issue on windows (#3787)
  • [datasets] Make LSUN OS agnostic (#3455)
  • [datasets] Update squeezenet urls (#3581)
  • [datasets] Add .item() to the target variable in fakedataset.py (#3587)
  • [datasets] Fix VOC da...
Read more

Dataset bugfixes

25 Mar 17:51
8fb5838
Compare
Choose a tag to compare

Highlights

This minor release bumps the pinned PyTorch version to v1.8.1, and brings a few bugfixes for datasets, including MNIST download not being available.

Bugfixes

  • fix VOC datasets for 2007 (#3572)
  • Update EMNIST url (#3567)
  • Fix redirect behavior of datasets.utils.download_url (#3564)
  • Fix MNIST download for minor release (#3559)

Mobile support, AutoAugment, improved IO and more

04 Mar 20:54
01dfa8e
Compare
Choose a tag to compare

This release introduces improved support for mobile, with new mobile-friendly models, pre-compiled binaries for Android available in maven and an android demo app. It also improves image IO and provides new data augmentations including AutoAugment.

Highlights

Better mobile support

torchvision 0.9 adds support for the MobileNetV3 architecture with pre-trained weights for Classification, Object Detection and Segmentation tasks.
It also improves C++ operators so that they can be compiled and run on Android, and we are providing pre-compiled torchvision artifacts published to jcenter. An example application on how to use the torchvision ops on an Android app can be found in here.

Classification

We provide MobileNetV3 variants (including a quantized version) pre-trained on ImageNet 2012.

import torch
import torchvision

# Classification
x = torch.rand(1, 3, 224, 224)
m_classifier = torchvision.models.mobilenet_v3_large(pretrained=True)
# m_classifier = torchvision.models.mobilenet_v3_small(pretrained=True)
m_classifier.eval()
predictions = m_classifier(x)

# Quantized Classification
x = torch.rand(1, 3, 224, 224)
m_classifier = torchvision.models.quantization.mobilenet_v3_large(pretrained=True)
m_classifier.eval()
predictions = m_classifier(x)

The pre-trained models have the following accuracies on ImageNet 2012 val:

Model Top-1 Acc Top-5 Acc
MobileNetV3 Large 74.042 91.340
MobileNetV3 Large (Quantized) 73.004 90.858
MobileNetV3 Small 67.620 87.404

Object Detection

We provide two variants of Faster R-CNN with MobileNetV3 backbone pre-trained on COCO train2017. They can be obtained as follows

import torch
import torchvision

# Fast Low Resolution Model
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(pretrained=True)
m_detector.eval()
predictions = m_detector(x)

# Highly Accurate High Resolution Model
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)
m_detector.eval()
predictions = m_detector(x)

And yield the following accuracies on COCO val 2017 (full results available in #3265):

Model mAP mAP@50 mAP@75
Faster R-CNN MobileNetV3-Large 320 FPN 22.8 38.0 23.2
Faster R-CNN MobileNetV3-Large FPN 32.8 52.5 34.3

Semantic Segmentation

We also provide pre-trained models for semantic segmentation. The models have been trained on a subset of COCO train2017, which contains the same 20 categories as those from Pascal VOC.

import torch
import torchvision

# Fast Mobile Model
x = torch.rand(1, 3, 520, 520)
m_segmenter = torchvision.models.segmentation.lraspp_mobilenet_v3_large(pretrained=True)
m_segmenter.eval()
predictions = m_segmenter(x)

# Highly Accurate Mobile Model
x = torch.rand(1, 3, 520, 520)
m_segmenter = torchvision.models.segmentation.deeplabv3_mobilenet_v3_large(pretrained=True)
m_segmenter.eval()
predictions = m_segmenter(x)

The pre-trained models give the following results on the subset of COCO val2017 which contain the same 20 categories as those present in Pascal VOC (full results in #3276):

Model mean IoU global pixelwise accuracy
Lite R-ASPP with Dilated MobileNetV3 Large Backbone 57.9 91.2
DeepLabV3 with Dilated MobileNetV3 Large Backbone 60.3 91.2

Addition of the AutoAugment method

AutoAugment is a common Data Augmentation technique that can improve the accuracy of Scene Classification models. Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that ImageNet policies provide significant improvements when applied to other datasets.

In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFA10 and SVHN. The new transform can be used standalone or mixed-and-matched with existing transforms:

from torchvision import transforms

t = transforms.AutoAugment()
transformed = t(image)

transform=transforms.Compose([
    transforms.Resize(256),
    transforms.AutoAugment(),
    transforms.ToTensor()])

Improved Image IO and on-the-fly image type conversions

All the read and decode methods of the io.image package have been updated to:

  • Add support for Palette, Grayscale Alpha and RBG Alpha image types during PNG decoding.
  • Allow the on-the-fly conversion of image from one type to the other during read.
from torchvision.io.image import read_image, ImageReadMode

# keeps original type, channels unchanged
x1 = read_image("image.png")

# converts to grayscale, channels = 1
x2 = read_image("image.png", mode=ImageReadMode.GRAY)

# converts to grayscale with alpha transparency, channels = 2
x3 = read_image("image.png", mode=ImageReadMode.GRAY_ALPHA)

# coverts to RGB, channels = 3
x4 = read_image("image.png", mode=ImageReadMode.RGB)

# converts to RGB with alpha transparency, channels = 4
x5 = read_image("image.png", mode=ImageReadMode.RGB_ALPHA)

Python 3.9 and CUDA 11.1

This release adds official support for Python 3.9 and CUDA 11.1 (#3341, #3418)

Backwards Incompatible Changes

  • [Ops] Change default eps value of FrozenBN to better align with nn.BatchNorm (#2933)
  • [Ops] Remove deprecated _new_empty_tensor. (#3156)
  • [Transforms] ColorJitter gets its random params by calling get_params() (#3001)
  • [Transforms] Change rounding of transforms on integer tensors (#2964)
  • [Utils] Remove normalize from save_image (#3324)

New Features

Improvements

Datasets

  • Concatenate small tensors in video datasets to reduce the use of shared file descriptor (#1795)
  • Improve testing for datasets (#3336, #3337, #3402, #3412, #3413, #3415, #3416, #3345, #3376, #3346, #3338)
  • Check if dataset file is located on Google Drive before downloading it (#3245)
  • Improve Coco implementation (#3417)
  • Make download_url follow redirects (#3236)
  • make_dataset as staticmethod of DatasetFolder (#3215)
  • Add a warning if any clip can't be obtained from a video in VideoClips. (#2513)

Models

  • Improve error message in AnchorGenerator (#2960)
  • Disable pretrained backbone downloading if pretrained is True in segmentation models (#3325)
  • Support for image with no annotations in RetinaNet (#3032)
  • Change RoIHeads reshape to support empty batches. (#3031)
  • Fixed typing exception throwing issues with JIT (#3029)
  • Replace deprecated functional.sigmoid with torch.sigmoid in RetinaNet (#3307)
  • Assert that inputs are floating point in Faster R-CNN normalize method (#3266)
  • Speedup RetinaNet's postprocessing (#2828)

Ops

  • Added eps in the __repr__ of FrozenBN (#2852)
  • Added __repr__ to MultiScaleRoIAlign (#2840)
  • Exposing LevelMapper params in MultiScaleRoIAlign (#3151)
  • Enable autocast for all operators and let them use the dispatcher (#2926, #2922, #2928, #2898)

Transforms

  • adjust_hue now accepts tensors with one channel (#3222)
  • Add fill color support for tensor affine transforms (#2904)
  • Remove torchscript workaround for center_crop (#3118)
  • Improved error message for RandomCrop (#2816)

IO

  • Enabling to import read_file and the other methods from torchvision.io (#2918)
  • accept python bytes in _read_video_from_memory() (#3347)
  • Enable rtmp timeout in decoder (#3076)
  • Specify tls cert file to decoder through config (#3289, #3374)
  • Add UUID in LOG() in decoder (#3080)

References

  • Add weight averaging and storing methods in references utils (#3352)
  • Adding Preset Transforms in reference scripts (#3317)
  • Load variables when --resume /path/to/checkpoint --test-only (#3285)
  • Updated video classification ref example with new transforms (#2935)

Misc

Read more

Python 3.9 support and bugfixes

10 Dec 17:10
2f40a48
Compare
Choose a tag to compare

This minor release bumps the pinned PyTorch version to v1.7.1, and contains some minor improvements.

Highlights

Python 3.9 support

This releases add native binaries for Python 3.9 #3063

Bugfixes

  • Make read_file and write_file accept unicode strings on Windows #2949
  • Replaced tuple creation by one acceptable by majority of compilers #2937
  • Add docs for focal_loss #2979

Added version suffix back to package

27 Oct 21:22
45f960c
Compare
Choose a tag to compare

Issues resolved:

  • Cannot pip install torchvision==0.8.0+cu110 - #2912

Improved transforms, native image IO, new video API and more

27 Oct 16:17
291f7e2
Compare
Choose a tag to compare

This release brings new additions to torchvision that improves support for model deployment. Most notably, transforms in torchvision are now torchscript-compatible, and can thus be serialized together with your model for simpler deployment. Additionally, we provide native image IO with torchscript support, and a new video reading API (released as Beta) which is more flexible than torchvision.io.read_video.

Highlights

Transforms now support Tensor, batch computation, GPU and TorchScript

torchvision transforms are now inherited from nn.Module and can be torchscripted and applied on torch Tensor inputs as well as on PIL images. They also support Tensors with batch dimension and work seamlessly on CPU/GPU devices:

import torch
import torchvision.transforms as T

# to fix random seed, use torch.manual_seed
# instead of random.seed
torch.manual_seed(12)

transforms = torch.nn.Sequential(
    T.RandomCrop(224),
    T.RandomHorizontalFlip(p=0.3),
    T.ConvertImageDtype(torch.float),
    T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
)
scripted_transforms = torch.jit.script(transforms)
# Note: we can similarly use T.Compose to define transforms
# transforms = T.Compose([...]) and 
# scripted_transforms = torch.jit.script(torch.nn.Sequential(*transforms.transforms))

tensor_image = torch.randint(0, 256, size=(3, 256, 256), dtype=torch.uint8)
# works directly on Tensors
out_image1 = transforms(tensor_image)
# on the GPU
out_image1_cuda = transforms(tensor_image.cuda())
# with batches
batched_image = torch.randint(0, 256, size=(4, 3, 256, 256), dtype=torch.uint8)
out_image_batched = transforms(batched_image)
# and has torchscript support
out_image2 = scripted_transforms(tensor_image)

These improvements enable the following new features:

  • support for GPU acceleration
  • batched transformations e.g. as needed for videos
  • transform multi-band torch tensor images (with more than 3-4 channels)
  • torchscript transforms together with your model for deployment

Note: Exceptions for TorchScript support includes Compose, RandomChoice, RandomOrder, Lambda and those applied on PIL images, such as ToPILImage.

Native image IO for JPEG and PNG formats

torchvision 0.8.0 introduces native image reading and writing operations for JPEG and PNG formats. Those operators support TorchScript and return CxHxW tensors in uint8 format, and can thus be now part of your model for deployment in C++ environments.

from torchvision.io import read_image

# tensor_image is a CxHxW uint8 Tensor
tensor_image = read_image('path_to_image.jpeg')

# or equivalently
from torchvision.io.image import read_file, decode_image
# raw_data is a 1d uint8 Tensor with the raw bytes
raw_data = read_file('path_to_image.jpeg')
tensor_image = decode_image(raw_data)

# all operators are torchscriptable and can be
# serialized together with your model torchscript code
scripted_read_image = torch.jit.script(read_image)

New detection model

This release adds a pretrained model for RetinaNet with a ResNet50 backbone from Focal Loss for Dense Object Detection, with the following accuracies on COCO val2017:

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.364
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.558
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.383
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.193
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.400
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.490
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.315
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.506
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.558
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.386
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.595
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.699

[BETA] New Video Reader API

This release introduces a new video reading abstraction, which gives more fine-grained control on how to iterate over the videos. It supports image and audio, and implements an iterator interface so that it can be combined with the rest of the python ecosystem, such as itertools.

from torchvision.io import VideoReader

# stream indicates if reading from audio or video
reader = VideoReader('path_to_video.mp4', stream='video')
# can change the stream after construction
# via reader.set_current_stream

# to read all frames in a video starting at 2 seconds
for frame in reader.seek(2):
    # frame is a dict with "data" and "pts" metadata
    print(frame["data"], frame["pts"])

# because reader is an iterator you can combine it with
# itertools
from itertools import takewhile, islice
# read 10 frames starting from 2 seconds
for frame in islice(reader.seek(2), 10):
    pass
    
# or to return all frames between 2 and 5 seconds
for frame in takewhile(lambda x: x["pts"] < 5, reader.seek(2)):
    pass

Note: In order to use the Video Reader API, you need to compile torchvision from source and make sure that you have ffmpeg installed in your system.
Note: the VideoReader API is currently released as beta and its API can change following user feedback.

Backwards Incompatible Changes

  • [Transforms] Random seed now should be set with torch.manual_seed instead of random.seed (#2292)
  • [Transforms] RandomErasing.get_params function’s argument was previously value=0 and is now value=None which is interpreted as Gaussian random noise (#2386)
  • [Transforms] RandomPerspective and F.perspective changed the default value of interpolation to be BILINEAR instead of BICUBIC (#2558, #2561)
  • [Transforms] Fixes incoherence in affine transformation when center is defined as half image size + 0.5 (#2468)

New Features

Improvements

Datasets

Models

  • Removed hard coded value in DeepLabV3 (#2793)
  • Changed the anchor generator default argument to an equivalent one (#2722)
  • Moved model construction location in resnet_fpn_backbone into after docstring (#2482)
  • Partially enabled type hints for models (#2668)

Ops

  • Moved RoIs shape check to C++ (#2794)
  • Use autocast built-in cast-helper functions (#2646)
  • Adde type annotations for torchvision.ops (#2331, #2462)

References

  • [References] Removed redundant target send to device in detection evaluation (#2503)
  • [References] Removed obsolete import in segmentation. (#2399)

Misc

  • [Transforms] Added support for negative padding in pad (#2744)
  • [IO] Added type hints for torchvision.io (#2543)
  • [ONNX] Export ROIAlign with aligned=True (#2613)

Internal

Bug Fixes

  • [Ops] Fixed crash in deformable convolutions (#2604)
  • [Ops] Added empty batch support for DeformConv2d (#2782)
  • [Transforms] Enforced contiguous output in to_tensor (#2483)
  • [Transforms] Fixed fill parameter for PIL pad (#2515)
  • [Models] Fixed deprecation warning in nonzero for R-CNN models (#2705)
  • [IO] Explicitly cast to size_t in video decoder (#2389)
  • [ONNX] Fixed dynamic resize in Mask R-CNN (#2488)
  • [C++ API] Fixed function signatures for torch::nn::Functional (#2463)

Deprecations

  • [Transforms] Deprecated dedicated implementations functional_tensor of F_t.center_crop, F_t.five_crop, `F_t.te...
Read more

Mixed precision training, new models and improvements

28 Jul 15:04
78ed10c
Compare
Choose a tag to compare

Highlights

Mixed precision support for all models

torchvision models now support mixed-precision training via the new torch.cuda.amp package. Using mixed precision support is easy: just wrap the model and the loss inside a torch.cuda.amp.autocast context manager. Here is an example with Faster R-CNN:

import torch, torchvision

device = torch.device('cuda')

model = torchvision.models.detection.fasterrcnn_resnet50_fpn()
model.to(device)

input = [torch.rand(3, 300, 400, device=device)]
boxes = torch.rand((5, 4), dtype=torch.float32, device=device)
boxes[:, 2:] += boxes[:, :2]
target = [{"boxes": boxes,
          "labels": torch.zeros(5, dtype=torch.int64, device=device),
          "image_id": 4,
          "area": torch.zeros(5, dtype=torch.float32, device=device),
          "iscrowd": torch.zeros((5,), dtype=torch.int64, device=device)}]

# use automatic mixed precision
with torch.cuda.amp.autocast():
    loss_dict = model(input, target)
losses = sum(loss for loss in loss_dict.values())
# perform backward outside of autocast context manager
losses.backward()

New pre-trained segmentation models

This releases adds pre-trained weights for the ResNet50 variants of Fully-Convolutional Networks (FCN) and DeepLabV3.
They are available under torchvision.models.segmentation, and can be obtained as follows:

torchvision.models.segmentation.fcn_resnet50(pretrained=True)
torchvision.models.segmentation.deeplabv3_resnet50(pretrained=True)

They obtain the following accuracies:

Network mean IoU global pixelwise acc
FCN ResNet50 60.5 91.4
DeepLabV3 ResNet50 66.4 92.4

Improved ONNX support for Faster / Mask / Keypoint R-CNN

This release restores ONNX support for the R-CNN family of models that had been temporarily dropped in the 0.6.0 release, and additionally fixes a number of corner cases in the ONNX export for these models.
Notable improvements includes support for dynamic input shape exports, including images with no detections.

Backwards Incompatible Changes

  • [Transforms] Fix for integer fill value in constant padding (#2284)
  • [Models] Replace L1 loss with smooth L1 loss in Faster R-CNN for better performance (#2113)
  • [Transforms] Use torch.rand instead of random.random() for random transforms (#2520)

New Features

  • [Models] Add mixed-precision support (#2366, #2384)
  • [Models] Add fcn_resnet50 and deeplabv3_resnet50 pretrained models. (#2086, #2091)
  • [Ops] Added eps attribute to FrozenBatchNorm2d (#2190)
  • [Transforms] Add convert_image_dtype to functionals (#2078)
  • [Transforms] Add pil_to_tensor to functionals (#2092)

Bug Fixes

  • [JIT] Fix virtualenv and torchhub support by removing eager scripting calls (#2248)
  • [IO] Fix write_video when floating point FPS is passed (#2334)
  • [IO] Fix missing compilation files for video-reader (#2183)
  • [IO] Fix missing include for OSX in video decoder (#2224)
  • [IO] Fix overflow error for large buffers. (#2303)
  • [Ops] Fix wrong clamping in RoIAlign with aligned=True (#2438)
  • [Ops] Fix corner case in interpolate (#2146)
  • [Ops] Fix the use of contiguous() in C++ kernels (#2131)
  • [Ops] Restore support of tuple of Tensors for region pooling ops (#2199)
  • [Datasets] Fix bug related with trailing slash on UCF-101 dataset (#2186)
  • [Models] Make copy of targets in GeneralizedRCNNTransform (#2227)
  • [Models] Fix DenseNet issue with gradient checkpoints (#2236)
  • [ONNX] Fix ONNX implementation ofheatmaps_to_keypoints in KeypointRCNN (#2312)
  • [ONNX] Fix export of images with no detection for Faster / Mask / Keypoint R-CNN (#2126, #2215, #2272)

Deprecations

  • [Ops] Deprecate Conv2d, ConvTranspose2d and BatchNorm2d (#2244)
  • [Ops] Deprecate interpolate in favor of PyTorch's implementation (#2252)

Improvements

Datasets

  • Fix DatasetFolder error message (#2143)
  • Change range(len) to enumerate in DatasetFolder (#2153)
  • [DOC] Fix link URL to Flickr8k (#2178)
  • [DOC] Add CelebA to docs (#2107)
  • [DOC] Improve documentation of DatasetFolder and ImageFolder (#2112)

TorchHub

  • Fix torchhub tests due to numerical changes in torch.sum (#2361)
  • Add all the latest models to hubconf (#2189)

Transforms

  • Add fill argument to __repr__ of RandomRotation (#2340)
  • Add tensor support for adjust_hue (#2300, #2355)
  • Make ColorJitter torchscriptable (#2298)
  • Make RandomHorizontalFlip and RandomVerticalFlip torchscriptable (#2282)
  • [DOC] Use consistent symbols in the doc of Normalize to avoid confusion (#2181)
  • [DOC] Fix typo in hflip in functional.py (#2177)
  • [DOC] Fix spelling errors in functional.py (#2333)

IO

  • Refactor video.py to improve clarity (#2335)
  • Save memory by not storing full frames in read_video_timestamps (#2202, #2268)
  • Improve warning when video_reader backend is not available (#2225)
  • Set should_buffer to True by default in _read_from_stream (#2201)
  • [Test] Temporarily disable one PyAV test (#2150)

Models

  • Improve target checks in GeneralizedRCNN (#2207, #2258)
  • Use Module objects instead of functions for some layers of Inception3 (#2287)
  • Add support for other normalizations in MobileNetV2 (#2267)
  • Expose layer freezing option to detection models (#2160, #2242)
  • Make ASPP-Layer in DeepLab more generic (#2174)
  • Faster initialization for Inception family of models (#2170, #2211)
  • Make norm_layer as parameters in models/detection/backbone_utils.py (#2081)
  • Updates integer division to use floor division operator (#2234, #2243)
  • [JIT] Clean up no longer needed workarounds for torchscript support (#2249, #2261, #2210)
  • [DOC] Add docs to clarify aspect ratio definition in RPN. (#2185)
  • [DOC] Fix roi_heads argument name in doctstring of GeneralizedRCNN (#2093)
  • [DOC] Fix type annotation in RPN docstring (#2149)
  • [DOC] add clarifications to Object detection reference documentation (#2241)
  • [Test] Add tests for negative samples for Mask R-CNN and Keypoint R-CNN (#2069)

Reference scripts

  • Add support for SyncBatchNorm in QAT reference script (#2230, #2280)
  • Fix training resuming in references/segmentation (#2142)
  • Rename image to images in references/detection/engine.py (#2187)

ONNX

  • Add support for dynamic input shape export in R-CNN models (#2087)

Ops

  • Added number of features in FrozenBatchNorm2d __repr__ (#2168)
  • improve consistency among box IoU CPU / GPU calculations (#2072)
  • Avoid using in header files (#2257)
  • Make ceil_div __host__ __device__ (#2217)
  • Don't include CUDAApplyUtils.cuh (#2127)
  • Add namespace to avoid conflict with ATen version of channel_shuffle() (#2206)
  • [DOC] Update the statement of supporting torchscript ops (#2343)
  • [DOC] Update torchvision ops in doc (#2341)
  • [DOC] Improve documentation for NMS (#2159)
  • [Test] Add more tests to NMS (#2279)

Misc

  • Add PyTorch version compatibility table to README (#2260)
  • Fix lint (#2182, #2226, #2070)
  • Update version to 0.6.0 in CMake (#2140)
  • Remove mock (#2096)
  • Remove warning about deprecated (#2064)
  • Cleanup unused import (#2067)
  • Type annotations for torchvision/utils.py (#2034)

CI

  • Add version suffix to build version
  • Add backslash to escape
  • Add workflows to run on tag
  • Bump version to 0.7.0, pin PyTorch to 1.6.0
  • Update link for cudnn 10.2 (#2277)
  • Fix binary builds with CUDA 9.2 on Windows (#2273)
  • Remove Python 3.5 from CI (#2158)
  • Improvements to CI infra (#2075, #2071, #2058, #2073, #2099, #2137, #2204, #2264, #2274, #2319)
  • Master version bump 0.6 -> 0.7 (#2102)
  • Add test channels for pytorch version functions (#2208)
  • Add static type check with mypy (#2195, #1696, #2247)