Releases: pytorch/vision
Update dependency on wheels to match version in PyPI
Users were reporting issues installing torchvision on PyPI, this release contains an update to the dependencies for wheels to point directly to torch==0.10.0
RegNet, EfficientNet, FX Feature Extraction and more
This release introduces the RegNet and EfficientNet architectures, a new FX-based utility to perform Feature Extraction, new data augmentation techniques such as RandAugment and TrivialAugment, updated training recipes that support EMA, Label Smoothing, Learning-Rate Warmup, Mixup and Cutmix, and many more.
Highlights
New Models
RegNet and EfficientNet are two popular architectures that can be scaled to different computational budgets. In this release we include 22 pre-trained weights for their classification variants. The models were trained on ImageNet and can be used as follows:
import torch
from torchvision import models
x = torch.rand(1, 3, 224, 224)
regnet = models.regnet_y_400mf(pretrained=True)
regnet.eval()
predictions = regnet(x)
efficientnet = models.efficientnet_b0(pretrained=True)
efficientnet.eval()
predictions = efficientnet(x)
The accuracies of the pre-trained models obtained on ImageNet val are seen below (see #4403, #4530 and #4293 for more details)
Model | Acc@1 | Acc@5 |
---|---|---|
regnet_x_400mf | 72.834 | 90.95 |
regnet_x_800mf | 75.212 | 92.348 |
regnet_x_1_6gf | 77.04 | 93.44 |
regnet_x_3_2gf | 78.364 | 93.992 |
regnet_x_8gf | 79.344 | 94.686 |
regnet_x_16gf | 80.058 | 94.944 |
regnet_x_32gf | 80.622 | 95.248 |
regnet_y_400mf | 74.046 | 91.716 |
regnet_y_800mf | 76.42 | 93.136 |
regnet_y_1_6gf | 77.95 | 93.966 |
regnet_y_3_2gf | 78.948 | 94.576 |
regnet_y_8gf | 80.032 | 95.048 |
regnet_y_16gf | 80.424 | 95.24 |
regnet_y_32gf | 80.878 | 95.34 |
EfficientNet-B0 | 77.692 | 93.532 |
EfficientNet-B1 | 78.642 | 94.186 |
EfficientNet-B2 | 80.608 | 95.31 |
EfficientNet-B3 | 82.008 | 96.054 |
EfficientNet-B4 | 83.384 | 96.594 |
EfficientNet-B5 | 83.444 | 96.628 |
EfficientNet-B6 | 84.008 | 96.916 |
EfficientNet-B7 | 84.122 | 96.908 |
We would like to thank Ross Wightman and Luke Melas-Kyriazi for contributing the weights of the EfficientNet variants.
FX-based Feature Extraction
A new Feature Extraction method has been added to our utilities. It uses PyTorch FX and enables us to retrieve the outputs of intermediate layers of a network which is useful for feature extraction and visualization. Here is an example of how to use the new utility:
import torch
from torchvision.models import resnet50
from torchvision.models.feature_extraction import create_feature_extractor
x = torch.rand(1, 3, 224, 224)
model = resnet50()
return_nodes = {
"layer4.2.relu_2": "layer4"
}
model2 = create_feature_extractor(model, return_nodes=return_nodes)
intermediate_outputs = model2(x)
print(intermediate_outputs['layer4'].shape)
We would like to thank Alexander Soare for developing this utility.
New Data Augmentations
Two new Automatic Augmentation techniques were added: Rand Augment and Trivial Augment. Both methods can be used as drop-in replacement of the AutoAugment technique as seen below:
from torchvision import transforms
t = transforms.RandAugment()
# t = transforms.TrivialAugmentWide()
transformed = t(image)
transform = transforms.Compose([
transforms.Resize(256),
transforms.RandAugment(), # transforms.TrivialAugmentWide()
transforms.ToTensor()])
We would like to thank Samuel G. Müller for contributing Trivial Augment and for his help on refactoring the AA package.
Updated Training Recipes
We have updated our training reference scripts to add support of Exponential Moving Average, Label Smoothing, Learning-Rate Warmup, Mixup, Cutmix and other SOTA primitives. The above enabled us to improve the classification Acc@1 of some pre-trained models by over 4 points. A major update of the existing pre-trained weights is expected on the next release.
Backward-incompatible changes
[models] Use torch instead of scipy for random initialization of inception and googlenet weights (#4256)
Deprecations
[models] Deprecate the C++ vision::models namespace (#4375)
New Features
[datasets] Add iNaturalist dataset (#4123)
[datasets] Download and Kinetics 400/600/700 Datasets (#3680)
[datasets] Added LFW Dataset (#4255)
[models] Add FX feature extraction as an alternative to intermediate_layer_getter (#4302) (#4418)
[models] Add RegNet Architecture in TorchVision (#4403) (#4530) (#4550)
[ops] Add new masks_to_boxes op (#4290) (#4469)
[ops] Add StochasticDepth implementation (#4301)
[reference scripts] Adding Mixup and Cutmix (#4379)
[transforms] Integration of TrivialAugment with the current AutoAugment Code (#4221)
[transforms] Adding RandAugment implementation (#4348)
[models] Add EfficientNet Architecture in TorchVision (#4293)
Improvements
Various documentation improvements (#4239) (#4251) (#4275) (#4342) (#3894) (#4159) (#4133) (#4138) (#4089) (#3944) (#4349) (#3754) (#4308) (#4352) (#4318) (#4244) (#4362) (#3863) (#4382) (#4484) (#4503) (#4376) (#4457) (#4505) (#4363) (#4361) (#4337) (#4546) (#4553) (#4565) (#4567) (#4574) (#4575) (#4383) (#4390) (#3409) (#4451) (#4340) (#3967) (#4072) (#4028) (#4132)
[build] Add CUDA-11.3 builds to torchvision (#4248)
[ci, tests] Skip some CPU-only tests on CircleCI machines with GPU (#4002) (#4025) (#4062)
[ci] New issue templates (#4299)
[ci] Various CI improvements, in particular putting back GPU testing on windows (#4421) (#4014) (#4053) (#4482) (#4475) (#3998) (#4388) (#4179) (#4394) (#4162) (#4065) (#3928) (#4081) (#4203) (#4011) (#4055) (#4074) (#4419) (#4067) (#4201) (#4200) (#4202) (#4496) (#3925)
[ci] ping maintainers in case a PR was not properly labeled (#3993) (#4012) (#4021) (#4501)
[datasets] Add bzip2 file compression support to datasets (#4097)
[datasets] Faster dataset indexing (#3939)
[datasets] Enable logging of internal dataset instanciations. (#4319) (#4090)
[datasets] Removed copy=False in torch.from_numpy in MNIST to avoid warning (#4184)
[io] Add warning for files with corrupt containers (#3961)
[models, tests] Add test to check that classification models are FX-compatible (#3662)
[tests] Speedup various tests (#3929) (#3933) (#3936)
[models] Allow custom activation in SqueezeExcitation of EfficientNet (#4448)
[models] Allow gradient backpropagation through GeneralizedRCNNTransform to inputs (#4327)
[ops, tests] Add JIT tests (#4472)
[ops] Make StochasticDepth FX-compatible (#4373)
[ops] Added backward pass on CPU and CUDA for interpolation with anti-alias option (#4208) (#4211)
[ops] Small refactoring to support opt mode for torchvision ops (fb internal specific) (#4080) (#4095)
[reference scripts] Added Exponential Moving Average support to classification reference script (#4381) (#4406) (#4407)
[reference scripts] Adding label smoothing on classification reference (#4335)
[reference scripts] Further enhance Classification Reference (#4444)
[reference scripts] Replaced to_tensor() with pil_to_tensor() + convert_image_dtype() (#4452)
[reference scripts] Update the metrics output on reference scripts (#4408)
[reference scripts] Warmup schedulers in References (#4411)
[tests] Add check for fx compatibility on segmentation and video models (#4131)
[tests] Mock redirection logic for tests (#4197)
[tests] Replace set_deterministic with non-deprecated spelling (#4212)
[tests] Skip building torchvision with ffmpeg when python==3.9 (#4417)
[tests] [jit] Make operation call accept Stack& instead Stack* (#63414) (#4380)
[tests] make tests that involve GDrive more robust (#4454)
[tests] remove dependency for dtype getters (#4291)
[transforms] Replaced example usage of ToTensor() by PILToTensor() + ConvertImageDtype() (#4494)
[transforms] Explicitly copying array in pil_to_tensor (#4566) (#4573)
[transforms] Make get_image_size and get_image_num_channels public. (#4321)
[transforms] adding gray images support for adjust_contrast and adjust_saturation (#4477) (#4480)
[utils] Support single color in utils.draw_bounding_boxes (#4075)
[video, documentation] Port the video_api.ipynb notebook to the example gallery (#4241)
[video, io, tests] Added check for invalid input file (#3932)
[video, io] remove deprecated function call (#3861) (#3989)
[video, tests] Removed test_audio_video_sync as it doesn't work as expected (#4050)
[video] Build torchvision with ffmpeg only on Linux and ignore ffmpeg on other platforms (#4413, #4410, #4041)
Bug Fixes
[build] Conda: Add numpy dependency (#4442)
[build] Explicitly exclude PIL 8.3.0 from compatible dependencies (#4148)
[build] More robust version check (#4285)
[ci] Fix broken clang format test. (#4320)
[ci] Remove mentions of conda-forge (#4082)
[ci] fixup '' -> '/./' for CI filter (#4059)
[datasets] Fix download from google drive which was downloading empty files in some cases (#4109)
[datasets] Fix splitting CelebA dataset (#4377)
[datasets] Add support for files with periods in name (#4099)
[io, tests] Don't check transparency channel for pil >= 8.3 in test_decode_png (#4167)
[io] Fix size_t issues across JPEG versions and platforms (#4439)
[io] Raise proper error when decoding 16-bits jpegs (#4101)
[io] Unpinned the libjpeg version and fixed jpeg_mem_dest's size type Wind… (#4288)
[io] deinterlacing PNG images with read_image (#4268)
[io] More robust ffmpeg version query in setup.py (#4254)
[io] Fixed read_image bug (#3948)
[models] Don't download backbone weights if pretrained=True (#4283)
[onnx, tests] Do not disable profiling executor in ...
Minor bugfix release
This release depends on pytorch 1.9.1
No functional changes other than minor updates to CI rules.
iOS support, GPU image decoding, SSDlite and more
This release improves support for mobile, with new mobile-friendly detection models based on SSD and SSDlite, CPU kernels for quantized NMS and quantized RoIAlign, pre-compiled binaries for iOS available in cocoapods and an iOS demo app. It also improves image IO by providing JPEG decoding on the GPU, and many more.
Highlights
[BETA] New models for detection
SSD and SSDlite are two popular object detection architectures which are efficient in terms of speed and provide good results for low resolution pictures. In this release, we provide implementations for the original SSD model with VGG16 backbone and for its mobile-friendly variant SSDlite with MobileNetV3-Large backbone. The models were pre-trained on COCO train2017 and can be used as follows:
import torch
import torchvision
# Original SSD variant
x = [torch.rand(3, 300, 300), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.ssd300_vgg16(pretrained=True)
m_detector.eval()
predictions = m_detector(x)
# Mobile-friendly SSDlite variant
x = [torch.rand(3, 320, 320), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.ssdlite320_mobilenet_v3_large(pretrained=True)
m_detector.eval()
predictions = m_detector(x)
The following accuracies can be obtained on COCO val2017 (full results available in #3403 and #3757):
Model | mAP | mAP@50 | mAP@75 |
---|---|---|---|
SSD300 VGG16 | 25.1 | 41.5 | 26.2 |
SSDlite320 MobileNetV3-Large | 21.3 | 34.3 | 22.1 |
[STABLE] Quantized kernels for object detection
The forward pass of the nms and roi_align operators now support tensors with a quantized dtype, which can help lowering the memory footprint of object detection models, particularly on mobile environments.
[BETA] JPEG decoding on the GPU
Decoding jpegs is now possible on GPUs with the use of nvjpeg, which should be readily available in your CUDA setup. The decoding time of a single image should be about 2 to 3 times faster than with libjpeg on CPU. While the resulting tensor will be stored on the GPU device, the input raw tensor still needs to reside on the host (CPU), because the first stages of the decoding process take place on the host:
from torchvision.io.image import read_file, decode_jpeg
data = read_file('path_to_image.jpg') # raw data is on CPU
img = decode_jpeg(data, device='cuda') # decoded image in on GPU
[BETA] iOS support
TorchVision 0.10 now provides pre-compiled iOS binaries for its C++ operators, which means you can run Faster R-CNN and Mask R-CNN on iOS. An example app on how to build a program leveraging those ops can be found in here.
[STABLE] Speed optimizations for Tensor transforms
The resize and flip transforms have been optimized and its runtime improved by up to 5x on the CPU. The corresponding PRs were sent to PyTorch in pytorch/pytorch#51653, pytorch/pytorch#54500 and pytorch/pytorch#56713
[STABLE] Documentation improvements
Significant improvements were made to the documentation. In particular, a new gallery of examples is available: see here for the latest version (the stable version is not released at the time of writing). These examples visually illustrate how each transform acts on an image, and also properly documents and illustrate the output of the segmentation models.
The example gallery will be extended in the future to provide more comprehensive examples and serve as a reference for common torchvision tasks.
Backwards Incompatible Changes
- [transforms] Ensure input type of
normalize
is float. (#3621) - [models] Use PyTorch
smooth_l1_loss
and remove private custom implementation (#3539)
New Features
- Added iOS binaries and test app (#3582)(#3629) (#3806)
- [datasets] Added KITTI dataset (#3640)
- [utils] Added utility to draw segmentation masks (#3330, #3824)
- [models] Added the SSD & SSDlite object detection models (#3403, #3757, #3766, #3855, #3896, #3818, #3799)
- [transforms] Added
antialias
option totransforms.functional.resize
(#3761, #3810, #3842) - [transforms] Add new
max_size
parameter toResize
(#3494) - [io] Support for decoding jpegs on GPU with
nvjpeg
(#3792) - [ci, rocm] Add ROCm to builds (#3840) (#3604) (#3575)
- [ops, models.quantization] Add quantized version of NMS (#3601)
- [ops, models.quantization] Add quantized version of RoIAlign (#3624, #3904)
Improvement
- [build] Various build improvements: (#3618) (#3622) (#3399) (#3794) (#3561)
- [ci] Various CI improvements (#3647) (#3609) (#3635) (#3599) (#3778) (#3636) (#3809) (#3625) (#3764) (#3679) (#3869) (#3871) (#3444) (#3445) (#3480) (#3768) (#3919) (#3641)(#3900)
- [datasets] Improve error handling in
make_dataset
(#3496) - [datasets] Remove caching from MNIST and variants (#3420)
- [datasets] Make
DatasetFolder.find_classes
public (#3628) - [datasets] Separate extraction and decompression logic in
datasets.utils.extract_archive
(#3443) - [datasets, tests] Improve dataset test coverage and infrastructure (#3450) (#3457) (#3454) (#3447) (#3489) (#3661) (#3458 (#3705) (#3411) (#3461) (#3465) (#3543) (#3550) (#3665) (#3464) (#3595) (#3466) (#3468) (#3467) (#3486) (#3736) (#3730) (#3731) (#3477) (#3589) (#3503) (#3423) (#3492)(#3578) (#3605) (#3448) (#3864) (#3544)
- [datasets, tests] Fix lazy importing for dataset tests (#3481)
- [datasets, tests] Fix
test_extract(zip|tar|tar_xz|gzip)
on windows (#3542) - [datasets, tests] Fix
kwargs
forwarding in fake data utility functions (#3459) - [datasets, tests] Properly fix dataset test that passes by accident (#3434)
- [documentation] Improve the documentation infrastructure (#3868) (#3724) (#3834) (#3689) (#3700) (#3513) (#3671) (#3490) (#3660) (#3594)
- [documentation] Various documentation improvements (#3793) (#3715) (#3727) (#3838) (#3701) (#3923) (#3643) (#3537) (#3691) (#3453) (#3437) (#3732) (#3683) (#3853) (#3684) (#3576) (#3739) (#3530) (#3586) (#3744) (#3645) (#3694) (#3584) (#3615) (#3693) (#3706) (#3646) (#3780) (#3704) (#3774) (#3634)(#3591)(#3807)(#3663)
- [documentation, ci] Improve the CI infrastructure for documentation (#3734) (#3837) (#3796) (#3711)
- [io] remove deprecated function calls (#3859) (#3858)
- [documentation, io] Improve IO docs and expose
ImageReadMode
intorchvision.io
(#3812) - [onnx, models] Replace
reshape
withflatten
in MobileNetV2 (#3462) - [ops, tests] Added test for
aligned=True
(#3540) - [ops, tests] Add onnx test for
batched_nms
(#3483) - [tests] Various test improvements (#3548) (#3422) (#3435) (#3860) (#3479) (#3721) (#3872) (#3908) (#2916) (#3917) (#3920) (#3579)
- [transforms] add
__repr__
fortransforms.RandomErasing
(#3491) - [transforms, documentation] Adds Documentation for AutoAugmentation (#3529)
- [transforms, documentation] Add illustrations of transforms with sphinx-gallery (#3652)
- [datasets] Remove pandas dependency for CelebA dataset (#3656, #3698)
- [documentation] Add docs for missing datasets (#3536)
- [referencescripts] Make reference scripts compatible with
submitit
(#3785) - [referencescripts] Updated
all_gather()
to make use ofall_gather_object()
from PyTorch (#3857) - [datasets] Added dataset download support in fbcode (#3823) (#3826)
Code quality
- Remove inconsistent FB copyright headers (#3741)
- Keep consistency in classes
ConvBNActivation
(#3750) - Removed unused imports (#3738, #3740, #3639)
- Fixed
floor_divide
deprecation warnings seen in pytest output (#3672) - Unify onnx and JIT
resize
implementations (#3654) - Cleaned-up imports in test files related to datasets (#3720)
- [documentation] Remove old css file (#3839)
- [ci] Fix inconsistent version pinning across yaml files (#3790)
- [datasets] Remove redundant
path.join
inPlaces365
(#3545) - [datasets] Remove imprecise error handling in
PhotoTour
dataset (#3488) - [datasets, tests] Remove obsolete
test_datasets_transforms.py
(#3867) - [models] Making protected params of MobileNetV3 public (#3828)
- [models] Make target argument in
transform.py
truly optional (#3866) - [models] Adding some references on MobileNetV3 implementation. (#3850)
- [models] Refactored
set_cell_anchors()
inAnchorGenerator
(#3755) - [ops] Minor cleanup of
roi_align_forward_kernel_impl
(#3619) - [ops] Replace deprecated
AutoNonVariableTypeMode
withAutoDispatchBelowADInplaceOrView
. (#3786, #3897) - [tests] Port tests to use pytest (#3852, #3845, #3697, #3907, #3749)
- [ops, tests] simplify
get_script_fn
(#3541) - [tests] Use torch.testing.assert_close in out test suite (#3886) (#3885) (#3883) (#3882) (#3881) (#3887) (#3880) (#3878) (#3877) (#3875) (#3888) (#3874) (#3884) (#3876) (#3879) (#3873)
- [tests] Clean up test accept behaviour (#3759)
- [tests] Remove unused
masks
variable intest_image.py
(#3910) - [transforms] use ternary if in
resize
(#3533) - [transforms] replaced deprecated call to
ByteTensor
withfrom_numpy
(#3813) - [transforms] Remove unnecessary casting in
adjust_gamma
(#3472)
Bugfixes
- [ci] set empty cxx flags as default (#3474)
- [android][test_app] Cleanup duplicate dependency (#3428)
- Remove leftover exception (#3717)
- Corrected spelling in a
TypeError
(#3659) - Add missing device info. (#3651)
- Moving tensors to the right device (#3870)
- Proper error message (#3725)
- [ci, io] Pin JPEG version to resolve the size_t issue on windows (#3787)
- [datasets] Make LSUN OS agnostic (#3455)
- [datasets] Update
squeezenet
urls (#3581) - [datasets] Add
.item()
to thetarget
variable infakedataset.py
(#3587) - [datasets] Fix VOC da...
Dataset bugfixes
Highlights
This minor release bumps the pinned PyTorch version to v1.8.1, and brings a few bugfixes for datasets, including MNIST download not being available.
Bugfixes
Mobile support, AutoAugment, improved IO and more
This release introduces improved support for mobile, with new mobile-friendly models, pre-compiled binaries for Android available in maven and an android demo app. It also improves image IO and provides new data augmentations including AutoAugment.
Highlights
Better mobile support
torchvision 0.9 adds support for the MobileNetV3 architecture with pre-trained weights for Classification, Object Detection and Segmentation tasks.
It also improves C++ operators so that they can be compiled and run on Android, and we are providing pre-compiled torchvision artifacts published to jcenter. An example application on how to use the torchvision ops on an Android app can be found in here.
Classification
We provide MobileNetV3 variants (including a quantized version) pre-trained on ImageNet 2012.
import torch
import torchvision
# Classification
x = torch.rand(1, 3, 224, 224)
m_classifier = torchvision.models.mobilenet_v3_large(pretrained=True)
# m_classifier = torchvision.models.mobilenet_v3_small(pretrained=True)
m_classifier.eval()
predictions = m_classifier(x)
# Quantized Classification
x = torch.rand(1, 3, 224, 224)
m_classifier = torchvision.models.quantization.mobilenet_v3_large(pretrained=True)
m_classifier.eval()
predictions = m_classifier(x)
The pre-trained models have the following accuracies on ImageNet 2012 val:
Model | Top-1 Acc | Top-5 Acc |
---|---|---|
MobileNetV3 Large | 74.042 | 91.340 |
MobileNetV3 Large (Quantized) | 73.004 | 90.858 |
MobileNetV3 Small | 67.620 | 87.404 |
Object Detection
We provide two variants of Faster R-CNN with MobileNetV3 backbone pre-trained on COCO train2017. They can be obtained as follows
import torch
import torchvision
# Fast Low Resolution Model
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(pretrained=True)
m_detector.eval()
predictions = m_detector(x)
# Highly Accurate High Resolution Model
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)
m_detector.eval()
predictions = m_detector(x)
And yield the following accuracies on COCO val 2017 (full results available in #3265):
Model | mAP | mAP@50 | mAP@75 |
---|---|---|---|
Faster R-CNN MobileNetV3-Large 320 FPN | 22.8 | 38.0 | 23.2 |
Faster R-CNN MobileNetV3-Large FPN | 32.8 | 52.5 | 34.3 |
Semantic Segmentation
We also provide pre-trained models for semantic segmentation. The models have been trained on a subset of COCO train2017, which contains the same 20 categories as those from Pascal VOC.
import torch
import torchvision
# Fast Mobile Model
x = torch.rand(1, 3, 520, 520)
m_segmenter = torchvision.models.segmentation.lraspp_mobilenet_v3_large(pretrained=True)
m_segmenter.eval()
predictions = m_segmenter(x)
# Highly Accurate Mobile Model
x = torch.rand(1, 3, 520, 520)
m_segmenter = torchvision.models.segmentation.deeplabv3_mobilenet_v3_large(pretrained=True)
m_segmenter.eval()
predictions = m_segmenter(x)
The pre-trained models give the following results on the subset of COCO val2017 which contain the same 20 categories as those present in Pascal VOC (full results in #3276):
Model | mean IoU | global pixelwise accuracy |
---|---|---|
Lite R-ASPP with Dilated MobileNetV3 Large Backbone | 57.9 | 91.2 |
DeepLabV3 with Dilated MobileNetV3 Large Backbone | 60.3 | 91.2 |
Addition of the AutoAugment method
AutoAugment is a common Data Augmentation technique that can improve the accuracy of Scene Classification models. Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that ImageNet policies provide significant improvements when applied to other datasets.
In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFA10 and SVHN. The new transform can be used standalone or mixed-and-matched with existing transforms:
from torchvision import transforms
t = transforms.AutoAugment()
transformed = t(image)
transform=transforms.Compose([
transforms.Resize(256),
transforms.AutoAugment(),
transforms.ToTensor()])
Improved Image IO and on-the-fly image type conversions
All the read and decode methods of the io.image
package have been updated to:
- Add support for Palette, Grayscale Alpha and RBG Alpha image types during PNG decoding.
- Allow the on-the-fly conversion of image from one type to the other during read.
from torchvision.io.image import read_image, ImageReadMode
# keeps original type, channels unchanged
x1 = read_image("image.png")
# converts to grayscale, channels = 1
x2 = read_image("image.png", mode=ImageReadMode.GRAY)
# converts to grayscale with alpha transparency, channels = 2
x3 = read_image("image.png", mode=ImageReadMode.GRAY_ALPHA)
# coverts to RGB, channels = 3
x4 = read_image("image.png", mode=ImageReadMode.RGB)
# converts to RGB with alpha transparency, channels = 4
x5 = read_image("image.png", mode=ImageReadMode.RGB_ALPHA)
Python 3.9 and CUDA 11.1
This release adds official support for Python 3.9 and CUDA 11.1 (#3341, #3418)
Backwards Incompatible Changes
- [Ops] Change default
eps
value ofFrozenBN
to better align withnn.BatchNorm
(#2933) - [Ops] Remove deprecated _new_empty_tensor. (#3156)
- [Transforms]
ColorJitter
gets its random params by callingget_params()
(#3001) - [Transforms] Change rounding of transforms on integer tensors (#2964)
- [Utils] Remove
normalize
fromsave_image
(#3324)
New Features
- [Datasets] Add WiderFace dataset (#2883)
- [Models] Add MobileNetV3 architecture:
- [Models] Improve speed/accuracy of FasterRCNN by introducing a score threshold on RPN (#3205)
- [Mobile] Add Android gradle project with demo test app (#2897)
- [Transforms] Implemented AutoAugment, along with required new transforms + Policies (#3123)
- [Ops] Added support of Autocast in all Operators: #2938, #2926, #2922, #2928, #2905, #2906, #2907, #2898
- [Ops] Add modulation input for DeformConv2D (#2791)
- [IO] Improved
io.image
with on-the-fly image type conversions: (#3193, #3069, #3024, #2988, #2984) - [IO] Add option to write audio to video file (#2304)
- [Utils] Added a utility to draw bounding boxes (#2785, #3296, #3075)
Improvements
Datasets
- Concatenate small tensors in video datasets to reduce the use of shared file descriptor (#1795)
- Improve testing for datasets (#3336, #3337, #3402, #3412, #3413, #3415, #3416, #3345, #3376, #3346, #3338)
- Check if dataset file is located on Google Drive before downloading it (#3245)
- Improve Coco implementation (#3417)
- Make download_url follow redirects (#3236)
make_dataset
asstaticmethod
ofDatasetFolder
(#3215)- Add a warning if any clip can't be obtained from a video in
VideoClips
. (#2513)
Models
- Improve error message in
AnchorGenerator
(#2960) - Disable pretrained backbone downloading if pretrained is True in segmentation models (#3325)
- Support for image with no annotations in RetinaNet (#3032)
- Change RoIHeads reshape to support empty batches. (#3031)
- Fixed typing exception throwing issues with JIT (#3029)
- Replace deprecated
functional.sigmoid
withtorch.sigmoid
in RetinaNet (#3307) - Assert that inputs are floating point in Faster R-CNN normalize method (#3266)
- Speedup RetinaNet's postprocessing (#2828)
Ops
- Added eps in the
__repr__
of FrozenBN (#2852) - Added
__repr__
toMultiScaleRoIAlign
(#2840) - Exposing LevelMapper params in
MultiScaleRoIAlign
(#3151) - Enable autocast for all operators and let them use the dispatcher (#2926, #2922, #2928, #2898)
Transforms
adjust_hue
now accepts tensors with one channel (#3222)- Add
fill
color support for tensor affine transforms (#2904) - Remove torchscript workaround for
center_crop
(#3118) - Improved error message for
RandomCrop
(#2816)
IO
- Enabling to import
read_file
and the other methods from torchvision.io (#2918) - accept python bytes in
_read_video_from_memory()
(#3347) - Enable rtmp timeout in decoder (#3076)
- Specify tls cert file to decoder through config (#3289, #3374)
- Add UUID in LOG() in decoder (#3080)
References
- Add weight averaging and storing methods in references utils (#3352)
- Adding Preset Transforms in reference scripts (#3317)
- Load variables when
--resume /path/to/checkpoint --test-only
(#3285) - Updated video classification ref example with new transforms (#2935)
Misc
- Various documentation improvements (#3039, #3271, #2820, #2808, #3131, #3062, #3061, #3000, #3299, #3400, #2899, #2901, #2908, #2851, #2909, #3005, #2821, #2957, #3360, #3019, #3124, #3217, #2879, #3234, #3180, #3425, #2979, #2935, #3298, #3268, #3203, #3290, #3295, #3200, #2663, #3153, #3147, #3232)
- The documentation infrastructure was improved, in particular the docs are now built on every PR and uploaded to CircleCI (#3259, #3378, #3408, #3373, #3290)
- Avoid some deprecation warnings from PyTorch (#3348)
- Ensure operators are added in C++ (#2798, #3091, #3391)
- Fixed compilation warnings on C++ codebase (#3390)
- CI Improvements (#3401, #3329, #2990, #2978, #3189, #3230, #3254, #2844, #2872, #2825, #3144, #3137, #2827, #2848, #2914, #3419, #2895, #2837)
- Installation improvements (#3302, #2969, #3113, #3202)
- CMake improvemen...
Python 3.9 support and bugfixes
This minor release bumps the pinned PyTorch version to v1.7.1, and contains some minor improvements.
Highlights
Python 3.9 support
This releases add native binaries for Python 3.9 #3063
Bugfixes
Added version suffix back to package
Issues resolved:
- Cannot pip install torchvision==0.8.0+cu110 - #2912
Improved transforms, native image IO, new video API and more
This release brings new additions to torchvision that improves support for model deployment. Most notably, transforms in torchvision are now torchscript-compatible, and can thus be serialized together with your model for simpler deployment. Additionally, we provide native image IO with torchscript support, and a new video reading API (released as Beta) which is more flexible than torchvision.io.read_video
.
Highlights
Transforms now support Tensor, batch computation, GPU and TorchScript
torchvision transforms are now inherited from nn.Module and can be torchscripted and applied on torch Tensor inputs as well as on PIL images. They also support Tensors with batch dimension and work seamlessly on CPU/GPU devices:
import torch
import torchvision.transforms as T
# to fix random seed, use torch.manual_seed
# instead of random.seed
torch.manual_seed(12)
transforms = torch.nn.Sequential(
T.RandomCrop(224),
T.RandomHorizontalFlip(p=0.3),
T.ConvertImageDtype(torch.float),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
)
scripted_transforms = torch.jit.script(transforms)
# Note: we can similarly use T.Compose to define transforms
# transforms = T.Compose([...]) and
# scripted_transforms = torch.jit.script(torch.nn.Sequential(*transforms.transforms))
tensor_image = torch.randint(0, 256, size=(3, 256, 256), dtype=torch.uint8)
# works directly on Tensors
out_image1 = transforms(tensor_image)
# on the GPU
out_image1_cuda = transforms(tensor_image.cuda())
# with batches
batched_image = torch.randint(0, 256, size=(4, 3, 256, 256), dtype=torch.uint8)
out_image_batched = transforms(batched_image)
# and has torchscript support
out_image2 = scripted_transforms(tensor_image)
These improvements enable the following new features:
- support for GPU acceleration
- batched transformations e.g. as needed for videos
- transform multi-band torch tensor images (with more than 3-4 channels)
- torchscript transforms together with your model for deployment
Note: Exceptions for TorchScript support includes Compose
, RandomChoice
, RandomOrder
, Lambda
and those applied on PIL images, such as ToPILImage
.
Native image IO for JPEG and PNG formats
torchvision 0.8.0 introduces native image reading and writing operations for JPEG and PNG formats. Those operators support TorchScript and return CxHxW
tensors in uint8
format, and can thus be now part of your model for deployment in C++ environments.
from torchvision.io import read_image
# tensor_image is a CxHxW uint8 Tensor
tensor_image = read_image('path_to_image.jpeg')
# or equivalently
from torchvision.io.image import read_file, decode_image
# raw_data is a 1d uint8 Tensor with the raw bytes
raw_data = read_file('path_to_image.jpeg')
tensor_image = decode_image(raw_data)
# all operators are torchscriptable and can be
# serialized together with your model torchscript code
scripted_read_image = torch.jit.script(read_image)
New detection model
This release adds a pretrained model for RetinaNet with a ResNet50 backbone from Focal Loss for Dense Object Detection, with the following accuracies on COCO val2017:
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.364
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.558
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.383
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.193
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.400
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.490
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.315
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.506
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.558
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.386
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.595
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.699
[BETA] New Video Reader API
This release introduces a new video reading abstraction, which gives more fine-grained control on how to iterate over the videos. It supports image and audio, and implements an iterator interface so that it can be combined with the rest of the python ecosystem, such as itertools
.
from torchvision.io import VideoReader
# stream indicates if reading from audio or video
reader = VideoReader('path_to_video.mp4', stream='video')
# can change the stream after construction
# via reader.set_current_stream
# to read all frames in a video starting at 2 seconds
for frame in reader.seek(2):
# frame is a dict with "data" and "pts" metadata
print(frame["data"], frame["pts"])
# because reader is an iterator you can combine it with
# itertools
from itertools import takewhile, islice
# read 10 frames starting from 2 seconds
for frame in islice(reader.seek(2), 10):
pass
# or to return all frames between 2 and 5 seconds
for frame in takewhile(lambda x: x["pts"] < 5, reader.seek(2)):
pass
Note: In order to use the Video Reader API, you need to compile torchvision from source and make sure that you have ffmpeg installed in your system.
Note: the VideoReader API is currently released as beta and its API can change following user feedback.
Backwards Incompatible Changes
- [Transforms] Random seed now should be set with
torch.manual_seed
instead ofrandom.seed
(#2292) - [Transforms]
RandomErasing.get_params
function’s argument was previouslyvalue=0
and is nowvalue=None
which is interpreted as Gaussian random noise (#2386) - [Transforms]
RandomPerspective
andF.perspective
changed the default value of interpolation to beBILINEAR
instead ofBICUBIC
(#2558, #2561) - [Transforms] Fixes incoherence in
affine
transformation when center is defined as half image size + 0.5 (#2468)
New Features
- [Ops] Added focal loss (#2784)
- [Ops] Added bounding boxes conversion function (#2710, #2737)
- [Ops] Added Generalized IOU (#2642)
- [Models] Added RetinaNet object detection model (#2784)
- [Datasets] Added Places365 dataset (#2610, #2625)
- [Transforms] Added GaussianBlur transform (#2658)
- [Transforms] Added torchscript, batch and GPU and tensor support for transforms (#2769, #2767, #2749, #2755, #2485, #2721, #2645, #2694, #2584, #2661, #2566, #2345, #2342, #2356, #2368, #2373, #2496, #2553, #2495, #2561, #2518, #2478, #2459, #2444, #2396, #2401, #2394, #2586, #2371, #2477, #2456, #2628, #2569, #2639, #2620, #2595, #2456, #2403, #2729)
- [Transforms] Added example notebook for tensor transforms (#2730)
- [IO] Added JPEG/PNG encoding / decoding ops
- [IO] Added file reading / writing ops (#2728, #2765, #2768)
- [IO] [BETA] Added new VideoReader API (#2683, #2781, #2778, #2802, #2596, #2612, #2734, #2770)
Improvements
Datasets
- Added error message if Google Drive download quota is exceeded (#2321)
- Optimized LSUN initialization time by only pulling keys from db (#2544)
- Use more precise return type for gzip.open() (#2792)
- Added UCF101 dataset tests (#2548)
- Added download tests on a schedule (#2665, #2675, #2699, #2706, #2747, #2731)
- Added typehints for datasets (#2487, #2521, #2522, #2523, #2524, #2526, #2528, #2529, #2525, #2527, #2530, #2533, #2534, #2535, #2536, #2532, #2538, #2537, #2539, #2531, #2540, #2667)
Models
- Removed hard coded value in DeepLabV3 (#2793)
- Changed the anchor generator default argument to an equivalent one (#2722)
- Moved model construction location in
resnet_fpn_backbone
into after docstring (#2482) - Partially enabled type hints for models (#2668)
Ops
- Moved RoIs shape check to C++ (#2794)
- Use autocast built-in cast-helper functions (#2646)
- Adde type annotations for
torchvision.ops
(#2331, #2462)
References
- [References] Removed redundant target send to device in detection evaluation (#2503)
- [References] Removed obsolete import in segmentation. (#2399)
Misc
- [Transforms] Added support for negative padding in
pad
(#2744) - [IO] Added type hints for
torchvision.io
(#2543) - [ONNX] Export
ROIAlign
withaligned=True
(#2613)
Internal
- [Binaries] Added CUDA 11 binary builds (#2671)
- [Binaries] Added DEBUG=1 option to build torchvision (#2603)
- [Binaries] Unpin ninja version (#2358)
- Warn if torchvision imported from repo root (#2759)
- Added compatibility checks for C++ extensions (#2467)
- Added probot (#2448)
- Added ipynb to git attributes file (#2772)
- CI improvements (#2328, #2346, #2374, #2437, #2465, #2579, #2577, #2633, #2640, #2727, #2754, #2674, #2678)
- CMakeList improvements (#2739, #2684, #2626, #2585, #2587)
- Documentation improvements (#2659, #2615, #2614, #2542, #2685, #2507, #2760, #2550, #2656, #2723, #2601, #2654, #2757, #2592, #2606)
Bug Fixes
- [Ops] Fixed crash in deformable convolutions (#2604)
- [Ops] Added empty batch support for
DeformConv2d
(#2782) - [Transforms] Enforced contiguous output in
to_tensor
(#2483) - [Transforms] Fixed fill parameter for PIL pad (#2515)
- [Models] Fixed deprecation warning in
nonzero
for R-CNN models (#2705) - [IO] Explicitly cast to
size_t
in video decoder (#2389) - [ONNX] Fixed dynamic resize in Mask R-CNN (#2488)
- [C++ API] Fixed function signatures for
torch::nn::Functional
(#2463)
Deprecations
- [Transforms] Deprecated dedicated implementations
functional_tensor
ofF_t.center_crop
,F_t.five_crop
, `F_t.te...
Mixed precision training, new models and improvements
Highlights
Mixed precision support for all models
torchvision models now support mixed-precision training via the new torch.cuda.amp
package. Using mixed precision support is easy: just wrap the model and the loss inside a torch.cuda.amp.autocast
context manager. Here is an example with Faster R-CNN:
import torch, torchvision
device = torch.device('cuda')
model = torchvision.models.detection.fasterrcnn_resnet50_fpn()
model.to(device)
input = [torch.rand(3, 300, 400, device=device)]
boxes = torch.rand((5, 4), dtype=torch.float32, device=device)
boxes[:, 2:] += boxes[:, :2]
target = [{"boxes": boxes,
"labels": torch.zeros(5, dtype=torch.int64, device=device),
"image_id": 4,
"area": torch.zeros(5, dtype=torch.float32, device=device),
"iscrowd": torch.zeros((5,), dtype=torch.int64, device=device)}]
# use automatic mixed precision
with torch.cuda.amp.autocast():
loss_dict = model(input, target)
losses = sum(loss for loss in loss_dict.values())
# perform backward outside of autocast context manager
losses.backward()
New pre-trained segmentation models
This releases adds pre-trained weights for the ResNet50 variants of Fully-Convolutional Networks (FCN) and DeepLabV3.
They are available under torchvision.models.segmentation
, and can be obtained as follows:
torchvision.models.segmentation.fcn_resnet50(pretrained=True)
torchvision.models.segmentation.deeplabv3_resnet50(pretrained=True)
They obtain the following accuracies:
Network | mean IoU | global pixelwise acc |
---|---|---|
FCN ResNet50 | 60.5 | 91.4 |
DeepLabV3 ResNet50 | 66.4 | 92.4 |
Improved ONNX support for Faster / Mask / Keypoint R-CNN
This release restores ONNX support for the R-CNN family of models that had been temporarily dropped in the 0.6.0 release, and additionally fixes a number of corner cases in the ONNX export for these models.
Notable improvements includes support for dynamic input shape exports, including images with no detections.
Backwards Incompatible Changes
- [Transforms] Fix for integer fill value in constant padding (#2284)
- [Models] Replace L1 loss with smooth L1 loss in Faster R-CNN for better performance (#2113)
- [Transforms] Use
torch.rand
instead ofrandom.random()
for random transforms (#2520)
New Features
- [Models] Add mixed-precision support (#2366, #2384)
- [Models] Add
fcn_resnet50
anddeeplabv3_resnet50
pretrained models. (#2086, #2091) - [Ops] Added eps attribute to FrozenBatchNorm2d (#2190)
- [Transforms] Add
convert_image_dtype
to functionals (#2078) - [Transforms] Add
pil_to_tensor
to functionals (#2092)
Bug Fixes
- [JIT] Fix virtualenv and torchhub support by removing eager scripting calls (#2248)
- [IO] Fix
write_video
when floating point FPS is passed (#2334) - [IO] Fix missing compilation files for video-reader (#2183)
- [IO] Fix missing include for OSX in video decoder (#2224)
- [IO] Fix overflow error for large buffers. (#2303)
- [Ops] Fix wrong clamping in RoIAlign with
aligned=True
(#2438) - [Ops] Fix corner case in
interpolate
(#2146) - [Ops] Fix the use of
contiguous()
in C++ kernels (#2131) - [Ops] Restore support of tuple of Tensors for region pooling ops (#2199)
- [Datasets] Fix bug related with trailing slash on UCF-101 dataset (#2186)
- [Models] Make copy of targets in GeneralizedRCNNTransform (#2227)
- [Models] Fix DenseNet issue with gradient checkpoints (#2236)
- [ONNX] Fix ONNX implementation of
heatmaps_to_keypoints
in KeypointRCNN (#2312) - [ONNX] Fix export of images with no detection for Faster / Mask / Keypoint R-CNN (#2126, #2215, #2272)
Deprecations
- [Ops] Deprecate Conv2d, ConvTranspose2d and BatchNorm2d (#2244)
- [Ops] Deprecate
interpolate
in favor of PyTorch's implementation (#2252)
Improvements
Datasets
- Fix DatasetFolder error message (#2143)
- Change
range(len)
toenumerate
inDatasetFolder
(#2153) - [DOC] Fix link URL to Flickr8k (#2178)
- [DOC] Add CelebA to docs (#2107)
- [DOC] Improve documentation of
DatasetFolder
andImageFolder
(#2112)
TorchHub
- Fix torchhub tests due to numerical changes in torch.sum (#2361)
- Add all the latest models to hubconf (#2189)
Transforms
- Add
fill
argument to__repr__
ofRandomRotation
(#2340) - Add tensor support for
adjust_hue
(#2300, #2355) - Make
ColorJitter
torchscriptable (#2298) - Make
RandomHorizontalFlip
andRandomVerticalFlip
torchscriptable (#2282) - [DOC] Use consistent symbols in the doc of
Normalize
to avoid confusion (#2181) - [DOC] Fix typo in
hflip
infunctional.py
(#2177) - [DOC] Fix spelling errors in
functional.py
(#2333)
IO
- Refactor
video.py
to improve clarity (#2335) - Save memory by not storing full frames in
read_video_timestamps
(#2202, #2268) - Improve warning when
video_reader
backend is not available (#2225) - Set
should_buffer
to True by default in_read_from_stream
(#2201) - [Test] Temporarily disable one PyAV test (#2150)
Models
- Improve target checks in GeneralizedRCNN (#2207, #2258)
- Use Module objects instead of functions for some layers of Inception3 (#2287)
- Add support for other normalizations in MobileNetV2 (#2267)
- Expose layer freezing option to detection models (#2160, #2242)
- Make ASPP-Layer in DeepLab more generic (#2174)
- Faster initialization for Inception family of models (#2170, #2211)
- Make
norm_layer
as parameters inmodels/detection/backbone_utils.py
(#2081) - Updates integer division to use floor division operator (#2234, #2243)
- [JIT] Clean up no longer needed workarounds for torchscript support (#2249, #2261, #2210)
- [DOC] Add docs to clarify aspect ratio definition in RPN. (#2185)
- [DOC] Fix roi_heads argument name in doctstring of GeneralizedRCNN (#2093)
- [DOC] Fix type annotation in RPN docstring (#2149)
- [DOC] add clarifications to Object detection reference documentation (#2241)
- [Test] Add tests for negative samples for Mask R-CNN and Keypoint R-CNN (#2069)
Reference scripts
- Add support for SyncBatchNorm in QAT reference script (#2230, #2280)
- Fix training resuming in
references/segmentation
(#2142) - Rename
image
toimages
inreferences/detection/engine.py
(#2187)
ONNX
- Add support for dynamic input shape export in R-CNN models (#2087)
Ops
- Added number of features in FrozenBatchNorm2d
__repr__
(#2168) - improve consistency among box IoU CPU / GPU calculations (#2072)
- Avoid
using
in header files (#2257) - Make
ceil_div
__host__ __device__
(#2217) - Don't include CUDAApplyUtils.cuh (#2127)
- Add namespace to avoid conflict with ATen version of
channel_shuffle()
(#2206) - [DOC] Update the statement of supporting torchscript ops (#2343)
- [DOC] Update torchvision ops in doc (#2341)
- [DOC] Improve documentation for NMS (#2159)
- [Test] Add more tests to NMS (#2279)
Misc
- Add PyTorch version compatibility table to README (#2260)
- Fix lint (#2182, #2226, #2070)
- Update version to 0.6.0 in CMake (#2140)
- Remove mock (#2096)
- Remove warning about deprecated (#2064)
- Cleanup unused import (#2067)
- Type annotations for torchvision/utils.py (#2034)
CI
- Add version suffix to build version
- Add backslash to escape
- Add workflows to run on tag
- Bump version to 0.7.0, pin PyTorch to 1.6.0
- Update link for cudnn 10.2 (#2277)
- Fix binary builds with CUDA 9.2 on Windows (#2273)
- Remove Python 3.5 from CI (#2158)
- Improvements to CI infra (#2075, #2071, #2058, #2073, #2099, #2137, #2204, #2264, #2274, #2319)
- Master version bump 0.6 -> 0.7 (#2102)
- Add test channels for pytorch version functions (#2208)
- Add static type check with mypy (#2195, #1696, #2247)