[10] Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection(学习可靠的定位质量估计用于密集目标检测)
paper | code
解读:大白话 Generalized Focal Loss V2

[9] MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection(用于类别识别无监督域自适应对象检测)
paper

[8] OPANAS: One-Shot Path Aggregation Network Architecture Search for Object(一键式路径聚合网络体系结构搜索对象)
paper | code

[7] Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection(小样本目标检测的语义关系推理)
paper

[6] General Instance Distillation for Object Detection(通用实例蒸馏技术在目标检测中的应用)
paper

[5] Instance Localization for Self-supervised Detection Pretraining(自监督检测预训练的实例定位)
paper｜code

[4] Multiple Instance Active Learning for Object Detection（用于对象检测的多实例主动学习）
paper | code

[3] Towards Open World Object Detection(开放世界中的目标检测)
paper | code

[2] Positive-Unlabeled Data Purification in the Wild for Object Detection(野外检测对象的阳性无标签数据提纯)

[1] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
paper | code
解读：无监督预训练检测器

视频目标检测(Video Object Detection)

[3] Depth from Camera Motion and Object Detection(相机运动和物体检测的深度)
paper

[2] There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge(多模态知识提取的自监督多目标检测与有声跟踪)
paper | video | project

[1] Dogfight: Detecting Drones from Drone Videos（从无人机视频中检测无人机）

三维目标检测(3D object detection)

[4] ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection(ST3D：在三维目标检测上进行无监督域自适应的自训练)
paper | code

[3] Center-based 3D Object Detection and Tracking(基于中心的3D目标检测和跟踪)
paper | code

[2] 3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection(利用IoU预测进行半监督3D对象检测)
paper | code | project | video

[1] Categorical Depth Distribution Network for Monocular 3D Object Detection(用于单目三维目标检测的分类深度分布网络)
paper

人物交互检测(HOI Detection)

[4] Detecting Human-Object Interaction via Fabricated Compositional Learning(通过人为构图学习检测人与物体的相互作用)
paper | code

[3] Reformulating HOI Detection as Adaptive Set Prediction(将人物交互检测重新配置为自适应集预测)
paper | code

[2] QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information(具有图像范围的上下文信息的基于查询的成对人物交互检测)
paper | code

[1] End-to-End Human Object Interaction Detection with HOI Transformer(使用HOI Transformer进行端到端的人类对象交互检测)
paper | code

伪装目标检测(Camouflaged Object Detection)

[1] Simultaneously Localize, Segment and Rank the Camouflaged Objects(同时定位，分割和排序伪装的对象)
paper | code

旋转目标检测(Rotation Object Detection)

[2] ReDet: A Rotation-equivariant Detector for Aerial Object Detection(ReDet：用于航空物体检测的等速旋转检测器)
paper | code

[1] Dense Label Encoding for Boundary Discontinuity Free Rotation Detection(密集标签编码，用于边界不连续自由旋转检测)
paper | code | 解读-DCL：旋转目标检测新方法

异常检测(Anomally Detection)

[1] Multiresolution Knowledge Distillation for Anomaly Detection(用于异常检测的多分辨率知识蒸馏)
paper

关键点检测(Keypoint Detection)

[1] Skeleton Merger: an Unsupervised Aligned Keypoint Detector(骨架合并：无监督的对准关键点检测器)
paper | code

分割(Segmentation)

图像分割(Image Segmentation)

[4] PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation(语义流经点以进行航空图像分割)
paper

[3] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space(在连续频率空间中通过情景学习进行医学图像分割的联合域泛化)
paper | code

[2] Few-Shot Segmentation Without Meta-Learning: A Good Transductive Inference Is All You Need?(【小样本】没有元学习的小样本分割：你只需要一个好的转换推论？)
paper | code

[1] PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation(语义流经点以进行航空图像分割)

全景分割(Panoptic Segmentation)

[2] Cross-View Regularization for Domain Adaptive Panoptic Segmentation(用于域自适应全景分割的跨视图正则化)
paper

[1] 4D Panoptic LiDAR Segmentation（4D全景LiDAR分割）
paper

语义分割(Semantic Segmentation)

[10] BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation(用于弱监督语义和实例细分的边界框归因图)
paper

[9] Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations(通过稀疏和纠缠的潜在表示的排斥力进行连续语义分割)
paper

[8] Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion(通过双边扩充和自适应融合对实点云场景进行语义分割)
paper

[7] Capturing Omni-Range Context for Omnidirectional Segmentation(捕获全方位上下文进行全方位分割)
paper

[6] MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation(MetaCorrection：语义分割中无监督域自适应的域感知元丢失校正)
paper

[5] Learning Statistical Texture for Semantic Segmentation(学习用于语义分割的统计纹理)
paper

[4] Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation(基于双层域混合的半监督域自适应语义分割)
paper

[3] Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation(多源领域自适应与协作学习的语义分割)
paper

[2] Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges(走向城市规模3D点云的语义分割：数据集，基准和挑战)
paper | code

[1] PLOP: Learning without Forgetting for Continual Semantic Segmentation（PLOP：学习而不会忘记连续的语义分割）
paper

实例分割(Instance Segmentation)

[2] BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation(用于弱监督语义和实例细分的边界框归因图)
paper

[1] End-to-End Video Instance Segmentation with Transformers(使用Transformer的端到端视频实例分割)
paper | code

超像素(Superpixel)

[1] Learning the Superpixel in a Non-iterative and Lifelong Manner(以非迭代和终身的方式学习超像素)
paper

视频目标分割(Video Object Segmentation)

[2] Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild(学习推荐帧用于交互式野外视频对象分割)
paper | code

[1] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion(模块化交互式视频对象分割：面具交互，传播和差异感知融合)
paper | project

抠图(Matting)

[1] Real-Time High Resolution Background Matting
paper | code | project | video

密集预测(Dense Prediction)

[3] Generic Perceptual Loss for Modeling Structured Output Dependencies(用于建模结构化输出依存关系的一般感知损失)
paper

[2]Densely connected multidilated convolutional networks for dense prediction tasks（用于密集预测任务的多重卷积连接网络）
paper

[1] Dense Contrastive Learning for Self-Supervised Visual Pre-Training(自监督视觉预训练的密集对比学习)
paper | code

估计(Estimation)

人体姿态估计(Human Pose Estimation)

[4] DCPose: Deep Dual Consecutive Network for Human Pose Estimation(用于人体姿态估计的深度双重连续网络)
paper | code

[3] Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing(用于实例感知人类语义解析的可微分多粒度人类表示学习)
paper | code

[2] CanonPose: Self-supervised Monocular 3D Human Pose Estimation in the Wild（野外自监督的单眼3D人类姿态估计）

[1] PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective Crop Layers（具有透视作物层的3D姿势的几何感知神经重建）
paper

手势估计(Gesture Estimation)

[2] Skeleton Based Sign Language Recognition Using Whole-body Keypoints(基于全身关键点的基于骨架的手语识别)
paper | code

[1] Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration(基于语义聚合和自适应2D-1D配准的相机空间手部网格恢复)
paper | code

光流/位姿/运动估计(Flow/Pose/Motion Estimation)

[4] FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism(具有分离旋转机制的类别级6D对象姿势估计的快速基于形状的网络)
paper

[3] GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation(用于单眼6D对象姿态估计的几何引导直接回归网络)
paper | code

[2] Robust Neural Routing Through Space Partitions for Camera Relocalization in Dynamic Indoor Environments(在动态室内环境中，通过空间划分的鲁棒神经路由可实现摄像机的重新定位)
paper | project

[1] MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization(通过3D扫描同步进行多主体分割和运动估计)
paper | code

深度估计(Depth Estimation)

[2] Beyond Image to Depth: Improving Depth Prediction using Echoes(超越图像深度：使用回声改善深度预测)
paper | code

[1] PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss(具有神经位置编码和蒸馏消光损耗的自我监督单视图深度估计的像素级精度)
paper

图像处理(Image Processing)

超分辨率(Super Resolution)

[4] ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic(通过数据特征加速超分辨率网络的通用框架)
paper | 解读-超分性能不降低，计算量降低50%：加速图像超分的ClassSR

[3] Learning Continuous Image Representation with Local Implicit Image Function(通过局部隐含图像功能学习连续图像表示)
paepr | code | video | project

[2] Data-Free Knowledge Distillation For Image Super-Resolution(DAFL算法的SR版本)

[1] AdderSR: Towards Energy Efficient Image Super-Resolution(将加法网路应用到图像超分辨率中)
paper | code
解读：华为开源加法神经网络

图像复原/图像增强(Image Restoration)

[2] NeX: Real-time View Synthesis with Neural Basis Expansion(NeX：具有神经基础扩展的实时视图合成)
paper | code

[1] Multi-Stage Progressive Image Restoration(多阶段渐进式图像复原)
paper | code

图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)

[2] Robust Reflection Removal with Reflection-free Flash-only Cues(通过无反射的仅含Flash线索进行鲁棒的反射去除)
paper | code

[1] Auto-Exposure Fusion for Single-Image Shadow Removal(用于单幅图像阴影去除的自动曝光融合)
paper | code

图像去噪/去模糊/去雨去雾(Image Denoising)

[3] Semi-Supervised Video Deraining with Dynamic Rain Generator(带动态雨水产生器的半监督视频去雨)
paper

[2] ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring(学习用于视频去模糊的全范围体积对应)
paper

[1] DeFMO: Deblurring and Shape Recovery of Fast Moving Objects(快速移动物体的去模糊和形状恢复)
paper | code | video

图像编辑/图像修复(Image Edit/Inpainting)

[6] Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE(使用分层VQ-VAE生成图像修复的多样结构)
paper | code

[5] PISE: Person Image Synthesis and Editing with Decoupled GAN(使用分离的GAN进行人像合成和编辑)
paper | code

[4] DeFLOCNet: Deep Image Editing via Flexible Low level Controls(通过灵活的低级控件进行深度图像编辑)

[3] PD-GAN: Probabilistic Diverse GAN for Image Inpainting(用于图像修复的概率多样GAN)

[2] Anycost GANs for Interactive Image Synthesis and Editing(用于交互式图像合成和编辑的AnyCost Gans)
paper | code

[1] Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing（利用GAN中潜在的空间维度进行实时图像编辑）

图像翻译（Image Translation）

[4] CoMoGAN: continuous model-guided image-to-image translation(连续的模型指导的图像到图像翻译)
paper | code

[3] Spatially-Adaptive Pixelwise Networks for Fast Image Translation(空间自适应像素网络，用于快速图像翻译)
paper | project

[2] Image-to-image Translation via Hierarchical Style Disentanglement
paper | code | 解读-层次风格解耦：人脸多属性篡改终于可控了(CVPR2021 Oral)

[1] Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation(样式编码：用于图像到图像翻译的StyleGAN编码器)
paper | code | project

图像质量评估(Image Quality Assessment)

[1] SDD-FIQA: Unsupervised Face Image Quality Assessment with Similarity Distribution Distance(具有相似分布距离的无监督人脸图像质量评估)
paper

人脸(Face)

[1] SDD-FIQA: Unsupervised Face Image Quality Assessment with Similarity Distribution Distance(具有相似分布距离的无监督人脸图像质量评估)
paper

人脸识别/检测(Facial Recognition/Detection)

[5] Cross-Domain Similarity Learning for Face Recognition in Unseen Domains(跨域相似性学习在未知领域中的人脸识别)
paper

[4] MagFace: A Universal Representation for Face Recognition and Quality Assessment(MagFace：人脸识别和质量评估的通用表示形式)
paper | code

[3] CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement(用于模型不可知的面部检测细化的置信度排名)
paper

[2] A 3D GAN for Improved Large-pose Facial Recognition(用于改善大姿势面部识别的3D GAN)
paper

[1] WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition(揭示了百万级深度人脸识别力量的基准测试)
paper | benchmark

人脸生成/合成/伪造(Face Generation/Face Synthesis/Face Forgery)

[7] Frequency-aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection(【人脸伪造检测】由单中心损失监督的频率感知判别特征学习，用于人脸伪造检测)
paper

[6] 3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction(单视图3D漫画面部重建的数据集和基线方法)
paper | project

[5] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis(进行全面伪造分析的多功能基准)
paper | code

[4] Image-to-image Translation via Hierarchical Style Disentanglement(通过分层样式分解实现图像到图像的翻译)
paper | code

[3] When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework(当年龄不变的人脸识别遇到人脸年龄合成时：一个多任务学习框架)
paper | code

[2] PISE: Person Image Synthesis and Editing with Decoupled GAN(使用分离的GAN进行人像合成和编辑)
paper | code

[1] Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders(分析和改进自省变分自动编码器)
paper | code | project

人脸反欺骗(Face Anti-Spoofing)

[1] Cross Modal Focal Loss for RGBD Face Anti-Spoofing(跨模态焦点损失，用于RGBD人脸反欺骗) paper

[2] Multi-attentional Deepfake Detection(多注意的Deepfake检测)
paper

目标跟踪(Object Tracking)

[7] Track to Detect and Segment: An Online Multi-Object Tracker(跟踪检测和分段：在线多目标跟踪器)
paper | code

[6] Learning a Proposal Classifier for Multiple Object Tracking(用于多对象跟踪的分类器)
paper | code

[5] Center-based 3D Object Detection and Tracking(基于中心的3D目标检测和跟踪)
paper | code

[4] HPS: localizing and tracking people in large 3D scenes from wearable sensors(通过可穿戴式传感器对大型3D场景中的人进行定位和跟踪)

[3] Track to Detect and Segment: An Online Multi-Object Tracker(跟踪检测和分段：在线多对象跟踪器)
project | video

[2] Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking(多目标跟踪的概率小波计分和修复)
paper

[1] Rotation Equivariant Siamese Networks for Tracking（旋转等距连体网络进行跟踪）
paper

图像/视频检索(Image/Video Retrieval)

[2] On Semantic Similarity in Video Retrieval(视频检索中的语义相似度)
paper ｜ code

[1] QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval(实用的查询高效的图像检索黑盒攻击)
paper

行为识别/动作识别/检测/分割(Action/Activity Recognition)

[8] Coarse-Fine Networks for Temporal Activity Detection in Videos(粗细网络，用于视频中的时间活动检测)
paper

[7] Learning Discriminative Prototypes with Dynamic Time Warping(通过动态时间扭曲学习判别性原型)
paper

[6] Temporal Action Segmentation from Timestamp Supervision(时间监督中的时间动作分割)
paper

[5] ACTION-Net: Multipath Excitation for Action Recognition(用于动作识别的多路径激励)
paper ｜ code

[4] BASAR:Black-box Attack on Skeletal Action Recognition(骨骼动作识别的黑匣子攻击)
paper

[3] Understanding the Robustness of Skeleton-based Action Recognition under Adversarial Attack(了解对抗攻击下基于骨骼的动作识别的鲁棒性)
paper

[2] Temporal Difference Networks for Efficient Action Recognition(用于有效动作识别的时差网络)
paper | code

[1] Behavior-Driven Synthesis of Human Dynamics(行为驱动的人类动力学综合)
paper | code<>

重识别

[3] Watching You: Global-guided Reciprocal Learning for Video-based Person Re-identification(基于视频的人员重新识别的全球指导对等学习)
paper

[2] Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification(联合抗噪学习和元相机移位自适应，用于无监督人员的重新识别)
paper

[1] Meta Batch-Instance Normalization for Generalizable Person Re-Identification(通用批处理人员重新标识的元批实例规范化)
paper

图像/视频字幕(Image/Video Caption)

[4] Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles(多实例字幕：从组织病理学教科书和文章中学习表示形式)
paper

[3] Open-book Video Captioning with Retrieve-Copy-Generate Network(带有检索复制生成网络的开卷视频字幕)
paper

[2] VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs(基于视频的文本生成的端到端学习来自多模式输入)
paper

[1] Scan2Cap: Context-aware Dense Captioning in RGB-D Scans(：RGB-D扫描中的上下文感知密集字幕) paper | code | project | video

场景图生成(Scene Graph Generation)

[2] Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation(场景图生成的语义歧义概率建模)
paper

[1] Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis(利用基于边缘的推理进行基于3D点的场景图分析)
paper

医学影像(Medical Imaging)

[9] XProtoNet: Diagnosis in Chest Radiography with Global and Local Explanations(使用全局和局部解释诊断胸部X光片)
paper

[8] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space(在连续频率空间中通过情景学习进行医学图像分割的联合域泛化)
paper | code

[7] Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles(多实例字幕：从组织病理学教科书和文章中学习表示形式)
paper

[6] Discovering Hidden Physics Behind Transport Dynamics(在运输动力学背后发现隐藏物理)
paper

[5] DeepTag: An Unsupervised Deep Learning Method for Motion Tracking on Cardiac Tagging Magnetic Resonance Images(一种心脏标记磁共振图像运动跟踪的无监督深度学习方法)
paper

[4] Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning(多机构协作改进基于深度学习的联合学习磁共振图像重建)
paper | code

[3] 3D Graph Anatomy Geometry-Integrated Network for Pancreatic Mass Segmentation, Diagnosis, and Quantitative Patient Management(用于胰腺肿块分割，诊断和定量患者管理的3D图形解剖学几何集成网络)

[2] Deep Lesion Tracker: Monitoring Lesions in 4D Longitudinal Imaging Studies(深部病变追踪器：在4D纵向成像研究中监控病变)
paper

[1] Automatic Vertebra Localization and Identification in CT by Spine Rectification and Anatomically-constrained Optimization(通过脊柱矫正和解剖学约束优化在CT中自动进行椎骨定位和识别)
paper

文本检测/识别(Text Detection/Recognition)

[2] Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition(像人类一样阅读：用于场景文本识别的自主，双向和迭代语言建模)
paper | code

[1] What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels(如果我们仅将真实数据集用于场景文本识别该怎么办？带有较少标签的场景文本识别)
paepr | code

遥感图像(Remote Sensing Image)

[2] PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation(语义流经点以进行航空图像分割)
paper

[1] Deep Gradient Projection Networks for Pan-sharpening(【超分辨率】泛锐化的深梯度投影网络)
paper | code

神经网络架构搜索(NAS)

[6] Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator(通过生成进行搜索：带有架构生成器的灵活高效的一键式NAS)
paper | code

[5] Contrastive Neural Architecture Search with Neural Architecture Comparators(带有神经结构比较器的对比神经网络架构搜索)
paper | code

[4] OPANAS: One-Shot Path Aggregation Network Architecture Search for Object(一键式路径聚合网络体系结构搜索对象)
paper | code

[3] AttentiveNAS: Improving Neural Architecture Search via Attentive(通过注意力改善神经架构搜索)
paper

[2] ReNAS: Relativistic Evaluation of Neural Architecture Search(NAS predictor当中ranking loss的重要性)
paper

[1] HourNAS: Extremely Fast Neural Architecture（降低NAS的成本）
paper

GAN/生成式/对抗式(GAN/Generative/Adversarial)

[15] DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network(通过对比生成对抗网络进行多种条件图像合成)
paper

[14] Diverse Semantic Image Synthesis via Probability Distribution Modeling(基于概率分布建模的多种语义图像合成)
paper | code

[13] HumanGAN: A Generative Model of Humans Images(人类图像的生成模型)
paper

[12] MetaSimulator: Simulating Unknown Target Models for Query-Efficient Black-box Attacks(模拟未知目标模型以提高查询效率的黑盒攻击)
paper | code

[11] Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders(分析和改进自省变分自动编码器)
paper | code | project

[10] LOHO: Latent Optimization of Hairstyles via Orthogonalization(LOHO：通过正交化潜在地优化发型)
paper

[9] PISE: Person Image Synthesis and Editing with Decoupled GAN(使用分离的GAN进行人像合成和编辑)
paper | code

[8] Closed-Form Factorization of Latent Semantics in GANs(GAN中潜在语义的闭式分解)
paper | code

[7] PD-GAN: Probabilistic Diverse GAN for Image Inpainting(用于图像修复的概率多样GAN)

[6] Anycost GANs for Interactive Image Synthesis and Editing(用于交互式图像合成和编辑的AnyCost Gans)
paper | code

[5] Efficient Conditional GAN Transfer with Knowledge Propagation across Classes(高效的有条件GAN转移以及跨课程的知识传播)
paper | code

[4] Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing（利用GAN中潜在的空间维度进行实时图像编辑）

[3] Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs(Hijack-GAN：意外使用经过预训练的黑匣子GAN)
paper

[2] Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation(样式编码：用于图像到图像翻译的StyleGAN编码器)
paper | code | project

[1] A 3D GAN for Improved Large-pose Facial Recognition(用于改善大姿势面部识别的3D GAN)
paper

图像生成/合成(Image Generation/Image Synthesis)

[8] DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network(通过对比生成对抗网络进行多种条件图像合成)
paper

[7] HumanGAN: A Generative Model of Humans Images(人类图像的生成模型)
paper

[6] PISE: Person Image Synthesis and Editing with Decoupled GAN(使用分离的GAN进行人像合成和编辑)
paper | code

[5] SMPLicit: Topology-aware Generative Model for Clothed People(穿衣服的人的拓扑感知生成模型)
paper | code

[4] Diversifying Sample Generation for Data-Free Quantization（多样化的样本生成，实现无数据量化）
paper

[3] Diverse Semantic Image Synthesis via Probability Distribution Modeling(基于概率分布建模的多种语义图像合成)
paper | code

[2] When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework(当年龄不变的人脸识别遇到人脸年龄合成时：一个多任务学习框架)
paper | code

[1] Anycost GANs for Interactive Image Synthesis and Editing(用于交互式图像合成和编辑的AnyCost Gans)
paper | code

三维视觉(3D Vision)

[2] A Deep Emulator for Secondary Motion of 3D Characters(三维角色二次运动的深度仿真器) paper

[1] 3D CNNs with Adaptive Temporal Feature Resolutions(具有自适应时间特征分辨率的3D CNN)
paper

点云(Point Cloud)

[14] Skeleton Merger: an Unsupervised Aligned Keypoint Detector(骨架合并：无监督的对准关键点检测器)
paper | code

[13] Cycle4Completion: Unpaired Point Cloud Completion using Cycle Transformation with Missing Region Coding(使用缺失区域编码的循环变换完成不成对的点云)
paper

[12] Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion(通过双边扩充和自适应融合对实点云场景进行语义分割)
paper

[11] How Privacy-Preserving are Line Clouds? Recovering Scene Details from 3D Lines(线云如何保护隐私？从3D线中恢复场景详细信息)
paper | code

[10] PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency(使用深度空间一致性进行稳健的点云配准)
paper | code

[9] Robust Point Cloud Registration Framework Based on Deep Graph Matching(基于深度图匹配的鲁棒点云配准框架)
paper | code

[8] TPCN: Temporal Point Cloud Networks for Motion Forecasting(面向运动预测的时态点云网络) paper | code

[7] PointGuard: Provably Robust 3D Point Cloud Classification(可证明稳健的三维点云分类)
paper

[6] Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges(走向城市规模3D点云的语义分割：数据集，基准和挑战)
paper | code

[5] SpinNet: Learning a General Surface Descriptor for 3D Point Cloud Registration(SpinNet：学习用于3D点云配准的通用表面描述符)
paper | code

[4] MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization(通过3D扫描同步进行多主体分割和运动估计)
paper | code

[3] Diffusion Probabilistic Models for 3D Point Cloud Generation(三维点云生成的扩散概率模型)
paper | code

[2] Style-based Point Generator with Adversarial Rendering for Point Cloud Completion(用于点云补全的对抗性渲染基于样式的点生成器)
paper

[1] PREDATOR: Registration of 3D Point Clouds with Low Overlap(预测器：低重叠的3D点云的配准)
paper | code | project

三维重建(3D Reconstruction)

[4] 3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction(单视图3D漫画面部重建的数据集和基线方法)
paper | project

[3] Learning Compositional Representation for 4D Captures with Neural ODE(使用神经ODE学习4D捕捉的合成表示)
paper

[2] SMPLicit: Topology-aware Generative Model for Clothed People(穿衣服的人的拓扑感知生成模型)
paper | code

[1] PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective Crop Layers（具有透视作物层的3D姿势的几何感知神经重建）
paper

模型压缩(Model Compression)

[2] Manifold Regularized Dynamic Network Pruning（动态剪枝的过程中考虑样本复杂度与网络复杂度的约束）

[1] Learning Student Networks in the Wild（一种不需要原始训练数据的模型压缩和加速技术）
paper | code
解读：华为诺亚方舟实验室提出无需数据网络压缩技术

知识蒸馏(Knowledge Distillation)

[7] Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation(通过自学来完善自己：通过自我蒸馏提炼特征)
paper | code

[6] Knowledge Evolution in Neural Networks(神经网络中的知识进化)
paper | code

[5] Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning(少班级增量学习的语义感知知识蒸馏)
paper

[4] Teachers Do More Than Teach: Compressing Image-to-Image Models(https://arxiv.org/abs/2103.03467)
paper | code

[3] General Instance Distillation for Object Detection(通用实例蒸馏技术在目标检测中的应用)
paper

[2] Multiresolution Knowledge Distillation for Anomaly Detection(用于异常检测的多分辨率知识蒸馏)
paper

[1] Distilling Object Detectors via Decoupled Features（前景背景分离的蒸馏技术）

剪枝(Pruning)

[1] Manifold Regularized Dynamic Network Pruning(流形规则化动态网络剪枝)
paper

量化(Quantization)

[1] Learnable Companding Quantization for Accurate Low-bit Neural Networks(精确低位神经网络的可学习压扩量化)
paper

神经网络结构设计(Neural Network Structure Design)

[7] Fast and Accurate Model Scaling(快速准确的模型缩放)
paper

[6] Involution: Inverting the Inherence of Convolution for Visual Recognition(反转卷积的固有性以进行视觉识别)
paper | code

[5] Inception Convolution with Efficient Dilation Search(具有有效膨胀搜索的初始卷积)
paper | code | 解读-Inception convolution

[4] Coordinate Attention for Efficient Mobile Network Design(协调注意力以实现高效的移动网络设计)
paper

[3] Rethinking Channel Dimensions for Efficient Model Design(重新考虑通道尺寸以进行有效的模型设计)
paper | code

[2] Inverting the Inherence of Convolution for Visual Recognition（颠倒卷积的固有性以进行视觉识别）

[1] RepVGG: Making VGG-style ConvNets Great Again
paper | code
解读：RepVGG：极简架构，SOTA性能，让VGG式模型再次伟大

Transformer

[3] Transformer Interpretability Beyond Attention Visualization(注意力可视化之外的Transformer可解释性)
paper | code

[2] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
paper | code
解读：无监督预训练检测器

[1] Pre-Trained Image Processing Transformer(底层视觉预训练模型)
paper | 解读-Transformer再下一城！low-level多个任务榜首被占领，北大华为等联合提出预训练模型IPT

图神经网络(GNN)

[2] Quantifying Explainers of Graph Neural Networks in Computational Pathology(计算病理学中图神经网络的量化解释器)
paper

[1] Sequential Graph Convolutional Network for Active Learning(主动学习的顺序图卷积网络)
paper

数据处理(Data Processing)

数据增广(Data Augmentation)

[2] AutoDO: Robust AutoAugment for Biased Data with Label Noise via Scalable Probabilistic Implicit Differentiation(通过可扩展的概率隐式微分对带有标签噪声的有偏数据进行鲁棒的自动增强)
paper

[1] KeepAugment: A Simple Information-Preserving Data Augmentation(一种简单的保存信息的数据扩充)
paper

表征学习(Representation Learning)

[5] Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks(神经零件：使用可逆神经网络学习富有表现力的3D形状提取)
paper

[4] VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples(对比视频表示学习和临时对抗示例)
paper

[3] Spatially Consistent Representation Learning(空间一致表示学习)
paper

[2] Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning(通过添加背景来删除背景：朝着背景进行鲁棒的自我监督视频表示学习)
paper | code | project | 解读

[1] VirTex: Learning Visual Representations from Textual Annotations（从文本注释中学习视觉表示）
paper | code

归一化/正则化(Batch Normalization)

[3] Adaptive Consistency Regularization for Semi-Supervised Transfer Learning(半监督转移学习的自适应一致性正则化)
paper | code

[2] Meta Batch-Instance Normalization for Generalizable Person Re-Identification(通用批处理人员重新标识的元批实例规范化)
paper

[1] Representative Batch Normalization with Feature Calibration（具有特征校准功能的代表性批量归一化）

图像聚类(Image Clustering)

[2] Improving Unsupervised Image Clustering With Robust Learning（通过鲁棒学习改善无监督图像聚类）
paper | code

[1] Reconsidering Representation Alignment for Multi-view Clustering(重新考虑多视图聚类的表示对齐方式)
paper | code

模型评估(Model Evaluation)

[1] Are Labels Necessary for Classifier Accuracy Evaluation?(测试集没有标签，我们可以拿来测试模型吗？)
paper | 解读

数据集(Dataset)

[4] Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark(多标签下水道缺陷分类数据集和基准)
paper | project&dataset

[3] 3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction(单视图3D漫画面部重建的数据集和基线方法)
paper | project

[2] Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges(走向城市规模3D点云的语义分割：数据集，基准和挑战)
paper | code

[1] Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels（重新标记ImageNet：从单标签到多标签，从全局标签到本地标签）
paper | code

主动学习(Active Learning)

[3] Vab-AL: Incorporating Class Imbalance and Difficulty with Variational Bayes for Active Learning
paper | code

[2] Multiple Instance Active Learning for Object Detection（用于对象检测的多实例主动学习）
paper | code

[1] Sequential Graph Convolutional Network for Active Learning(主动学习的顺序图卷积网络)
paper

小样本学习(Few-shot Learning)/零样本学习(Zero-shot Learning)

[6] Goal-Oriented Gaze Estimation for Zero-Shot Learning(零样本学习的目标导向注视估计)
paper | code

[5] Few-Shot Segmentation Without Meta-Learning: A Good Transductive Inference Is All You Need?
paper | code

[4] Counterfactual Zero-Shot and Open-Set Visual Recognition(反事实零射和开集视觉识别)
paper | code

[3] Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection(小样本目标检测的语义关系推理)
paper

[2] Few-shot Open-set Recognition by Transformation Consistency(转换一致性很少的开放集识别)

[1] Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning(探索少量学习的不变表示形式和等变表示形式的互补强度)
paper |

持续学习(Continual Learning/Life-long Learning)

[2] Rainbow Memory: Continual Learning with a Memory of Diverse Samples（不断学习与多样本的记忆）

[1] Learning the Superpixel in a Non-iterative and Lifelong Manner(以非迭代和终身的方式学习超像素)
paper

视觉推理(Visual Reasoning)

[1] Transformation Driven Visual Reasoning(转型驱动的视觉推理)
paper | code | project

迁移学习/domain/自适应](#domain)

[7] Dynamic Transfer for Multi-Source Domain Adaptation(多源域自适应的动态传输)
paper

[6] Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation(基于双层域混合的半监督域自适应语义分割)
paper

[5] Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation(多源领域自适应与协作学习的语义分割)
paper

[4] Continual Adaptation of Visual Representations via Domain Randomization and Meta-learning(通过域随机化和元学习对视觉表示进行连续调整)
paper

[3] Domain Generalization via Inference-time Label-Preserving Target Projections(基于推理时间保标目标投影的区域泛化)
paper

[2] MetaSCI: Scalable and Adaptive Reconstruction for Video Compressive Sensing(可伸缩的自适应视频压缩传感重建)
paper | code

[1] FSDR: Frequency Space Domain Randomization for Domain Generalization(用于域推广的频域随机化)
paper

对比学习(Contrastive Learning)

[2] AdCo: Adversarial Contrast for Efficient Learning of Unsupervised Representations from Self-Trained Negative Adversaries(有效对比自我训练的负面对抗无监督表示的对抗性对比)
paper | code | 解读-AdCo基于对抗的对比学习]

[1] Fine-grained Angular Contrastive Learning with Coarse Labels(粗标签的细粒度角度对比学习)
paper

强化学习(Reinforcement Learning)

[1] Unsupervised Learning for Robust Fitting:A Reinforcement Learning Approach(无监督学习以进行稳健拟合：一种强化学习方法)

paper

暂无分类

Dynamic Face Video Segmentation via Reinforcement Learning(通过强化学习进行动态人脸视频分割)
paper | code

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose(从像素到姿势学习可靠的相机定位)
paper | code

Student-Teacher Learning from Clean Inputs to Noisy Inputs(【模型训练】从纯净输入到噪音输入的师生学习)
paper

Uncertainty-guided Model Generalization to Unseen Domains(【模型泛化】不确定性指导的模型泛化)
paper

Monte Carlo Scene Search for 3D Scene Understanding(【3D场景理解&重建】蒙特卡洛场景搜索以了解3D场景)
paper

Rotation Coordinate Descent for Fast Globally Optimal Rotation Averaging(【优化】旋转坐标下降用于快速全局最优旋转平均)
paper

Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality(使用情感因果关系对多媒体内容进行情感分析)
paper

Deep Graph Matching under Quadratic Constraint(【图匹配】二次约束下的深度图匹配)
paper

Deep Gaussian Scale Mixture Prior for Spectral Compressive Imaging(用于光谱压缩成像的深高斯比例混合气)
paper | code

Limitations of Post-Hoc Feature Alignment for Robustness(健壮性的赛后特征对齐的局限性)
paper

MotionRNN: A Flexible Model for Video Prediction with Spacetime-Varying Motions(针对复杂时空运动的通用视频预测模型)
paper | 解读

Consensus Maximisation Using Influences of Monotone Boolean Functions(利用单调布尔函数的影响实现共识最大化)
paper

Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food(实现对通用食品的自动营养理解)
paper

Structured Scene Memory for Vision-Language Navigation(用于视觉语言导航的结构化场景存储器)
paper | code

Learning Asynchronous and Sparse Human-Object Interaction in Videos(视频中异步稀疏人-物交互的学习)
paper

Self-supervised Geometric Perception(自我监督的几何知觉)
paper

Quantifying Explainers of Graph Neural Networks in Computational Pathology(计算病理学中图神经网络的量化解释器)
paper

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts(探索具有对比场景上下文的数据高效3D场景理解)
paper | project | video

Data-Free Model Extraction(无数据模型提取)
paper

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition(用于【位置识别】的局部全局描述符的【多尺度融合】)
paper | code

Right for the Right Concept: Revising Neuro-Symbolic Concepts by Interacting with their Explanations(适用于正确概念的权利：通过可解释性来修正神经符号概念)
paper

Multi-Objective Interpolation Training for Robustness to Label Noise(多目标插值训练的鲁棒性)
paper | code

Hierarchical and Partially Observable Goal-driven Policy Learning with Goals Relational Graph(基于目标关系图的分层部分可观测目标驱动策略学习)
paper

ID-Unet: Iterative Soft and Hard Deformation for View Synthesis(视图合成的迭代软硬变形)
paper

PML: Progressive Margin Loss for Long-tailed Age Classification(【长尾分布】【图像分类】长尾年龄分类的累进边际损失)
paper

Domain Generalization via Inference-time Label-Preserving Target Projections（通过保留推理时间的目标投影进行域泛化）
paper

DeRF: Decomposed Radiance Fields（分解的辐射场）
project

Weakly-supervised Grounded Visual Question Answering using Capsules（使用胶囊进行弱监督的地面视觉问答）

CDFI: Compression-Driven Network Design for Frame Interpolation(用于帧插值的压缩驱动网络设计)
paper | code

FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation（【视频插帧】FLAVR：用于快速帧插值的与流无关的视频表示）
paper | code | project

Probabilistic Embeddings for Cross-Modal Retrieval（跨模态检索的概率嵌入）
paper

Self-supervised Simultaneous Multi-Step Prediction of Road Dynamics and Cost Map(道路动力学和成本图的自监督式多步同时预测)

IIRC: Incremental Implicitly-Refined Classification(增量式隐式定义的分类)
paper | project

Fair Attribute Classification through Latent Space De-biasing(通过潜在空间去偏的公平属性分类)
paper | code | project

Information-Theoretic Segmentation by Inpainting Error Maximization(修复误差最大化的信息理论分割)
paper

UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pretraining(【视频语言学习】UC2：通用跨语言跨模态视觉和语言预培训)

Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling(通过稀疏采样进行视频和语言学习)
paper | code

D-NeRF: Neural Radiance Fields for Dynamic Scenes(D-NeRF：动态场景的神经辐射场)
paper | project

Weakly Supervised Learning of Rigid 3D Scene Flow(刚性3D场景流的弱监督学习)
paper | code | project

2. CVPR2021 Oral

[36] Rotation Coordinate Descent for Fast Globally Optimal Rotation Averaging(【优化】旋转坐标下降用于快速全局最优旋转平均)
paper

[35] MagFace: A Universal Representation for Face Recognition and Quality Assessment(MagFace：人脸识别和质量评估的通用表示形式)
paper | code

[34] CoMoGAN: continuous model-guided image-to-image translation(连续的模型指导的图像到图像翻译)
paper | code

[33] FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism(具有分离旋转机制的类别级6D对象姿势估计的快速基于形状的网络)
paper

[32] Knowledge Evolution in Neural Networks(神经网络中的知识进化)
paper | code

[31] NeX: Real-time View Synthesis with Neural Basis Expansion(NeX：具有神经基础扩展的实时视图合成)
paper | code

[30] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis(进行全面伪造分析的多功能基准)
paper | code

[29] Dense Contrastive Learning for Self-Supervised Visual Pre-Training(自监督视觉预训练的密集对比学习)
paper | code

[28] Consensus Maximisation Using Influences of Monotone Boolean Functions(利用单调布尔函数的影响实现共识最大化)
paper

[27] Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing(用于实例感知人类语义解析的可微分多粒度人类表示学习)
paper | code

[26] Discovering Hidden Physics Behind Transport Dynamics(在运输动力学背后发现隐藏的物理)
paper

[25] Learning Continuous Image Representation with Local Implicit Image Function(通过局部隐含图像功能学习连续图像表示)
paepr | code | video | project | 解读-真正的无极放大！30x插值效果惊艳，英伟达等开源LIIF：巧妙的图像超分新思路

[24] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
paper | code
解读：无监督预训练检测器

[23] Self-supervised Geometric Perception(自我监督的几何知觉)
paper

[22] DeepTag: An Unsupervised Deep Learning Method for Motion Tracking on Cardiac Tagging Magnetic Resonance Images(一种心脏标记磁共振图像运动跟踪的无监督深度学习方法)
paper

[21] Modeling Multi-Label Action Dependencies for Temporal Action Localization(为时间动作本地化建模多标签动作相关性)
paper

[20] HPS: localizing and tracking people in large 3D scenes from wearable sensors(通过可穿戴式传感器对大型3D场景中的人进行定位和跟踪)

[19] Real-Time High Resolution Background Matting(实时高分辨率背景抠像)
paper | code | project | video

[18] Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts(探索具有对比场景上下文的数据高效3D场景理解)
paper | project | video

[17] Robust Neural Routing Through Space Partitions for Camera Relocalization in Dynamic Indoor Environments(在动态室内环境中，通过空间划分的鲁棒神经路由可实现摄像机的重新定位)
paper | project

[16] MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization(通过3D扫描同步进行多主体分割和运动估计)
paper | code

[15] Categorical Depth Distribution Network for Monocular 3D Object Detection(用于单目三维目标检测的分类深度分布网络)
paper

[14] PatchmatchNet: Learned Multi-View Patchmatch Stereo(学习多视图立体声)
paper | code

[13] Continual Adaptation of Visual Representations via Domain Randomization and Meta-learning(通过域随机化和元学习对视觉表示进行连续调整)
paper

[12] Single-Stage Instance Shadow Detection with Bidirectional Relation Learning(具有双向关系学习的单阶段实例阴影检测)

[11] Neural Geometric Level of Detail:Real-time Rendering with Implicit 3D Surfaces(神经几何细节水平：隐式3D曲面的实时渲染)
paper | code | project

[9] PREDATOR: Registration of 3D Point Clouds with Low Overlap(预测器：低重叠的3D点云的配准)
paper | code | project

[8] Domain Generalization via Inference-time Label-Preserving Target Projections(通过保留推理时间的目标投影进行域泛化)
paper

[7] Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction(全局一致的非刚性重建的神经变形图)
paper | project | video

[6] Fine-grained Angular Contrastive Learning with Coarse Labels(粗标签的细粒度角度对比学习)
paper

[5] Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling(通过稀疏采样进行视频和语言学习)
paper | code

[4] Cross-View Regularization for Domain Adaptive Panoptic Segmentation(用于域自适应全景分割的跨视图正则化)
paper

[3] Image-to-image Translation via Hierarchical Style Disentanglement(通过分层样式分解实现图像到图像的翻译)
paper | code

[2] Towards Open World Object Detection(开放世界中的目标检测)
paper | code

[1] End-to-End Video Instance Segmentation with Transformers(使用Transformer的端到端视频实例分割)
paper

CVPR2021 论文解读汇总

【13】无监督预训练检测器(CVPR2021 Oral)
无监督预训练模型无论是在nlp(BERT,GPT,XLNet)还是在cv(MoCo,SimCLR,BYOL)上都取得了突破性的进展。而对于无监督（自监督）预训练而言，最重要的就是设计一个合理的pretext，典型的像BERT的masked language model，MoCo的instance discrimination。他们都通过一定的方式，从样本中无监督的构造了一个"label"，从而对模型进行预训练，提高下游任务的表现。那么，对于DETR而言，既然CNN可以是无监督预训练的，那么transformer能不能也无监督预训练一下？
paper | code

【12】GFLV2：目标检测良心技术，无Cost涨点！
本文是检测领域首次引入用边界框的不确定性的统计量来高效地指导定位质量估计，从而基本无cost（包括在训练和测试阶段）地提升one-stage的检测器性能，涨幅在1~2个点AP。
paper | code

【11】DCL：旋转目标检测新方法
Densely Coded Labels (DCL)是 Circular Smooth Label (CSL)的优化版本。DCL主要从两方面进行了优化：过于厚重的预测层以及对类正方形目标检测的不友好。
paper | code

【10】层次风格解耦：人脸多属性篡改终于可控了(CVPR2021 Oral)
从CycleGAN提出后，图像翻译面临的最大的两个问题就是扩展性（同时处理多种篡改）和多样性（生成不同的结果），然而，一直没有一个很好的方法，可以兼顾扩展性和多样性的同时，又能使得这种篡改满足预期。例如，对于人脸属性篡改任务，我们想要给人脸加上刘海，可是却改变了发色或是背景，再例如，我们想要给人脸加上眼睛，结果竟然性别和年龄也改变了。HiSD就是为了解决这些问题，并且还同时支持从噪声中生成或者从图像中提取这样的风格。
paper | code

【9】Transformer再下一城！low-level多个任务榜首被占领，北大华为等联合提出预训练模型IPT
对low-level计算机视觉任务（比如降噪、超分、去雨）进行了研究并提出了一种新的预训练模型：IPT(image processing transformer)。为最大挖掘transformer的能力，作者采用知名的ImageNet制作了大量的退化图像数据对，然后采用这些训练数据对对所提IPT(它具有多头、多尾以适配多种退化降质模型)模型进行训练。此外，作者还引入了对比学习以更好的适配不同的图像处理任务。经过微调后，预训练模型可以有效的应用不到的任务中。仅仅需要一个预训练模型，IPT即可在多个low-level基准上取得优于SOTA方案的性能。
paper

【8】真正的无极放大！30x插值效果惊艳，英伟达等开源LIIF：巧妙的图像超分新思路
一种新颖的连续图像表达方案。它在离散2D图像与连续2D图像之间构建了一种巧妙的连接。受益于所提方法的“连续表达”，它能够对图像进行分辨率调整，做到了真正意义上的“无极放大”，甚至可以进行30x的放大处理。
paper | code | video | project

【7】AdCo基于对抗的对比学习
自监督学习领域，基于contrastive learning（对比学习）的思路已经在下游分类检测和任务中取得了明显的优势。其中如何充分利用负样本提高学习效率和学习效果一直是一个值得探索的方向，本文第一次全新提出了用对抗的思路end-to-end来直接学习负样本，在ImageNet和下游任务均达到SOTA。AdCo仅仅用8196个负样本（八分之一的MoCo v2的负样本量），就能达到与之相同的精度。同时，这些可直接训练的负样本在和BYOL中Prediction MLP参数量相同的情况下依然能够取得相似的效果。这说明了在自监督学习时代，通过将负样本可学习化，对比学习仍然具有学习效率高、训练稳定和精度高等一系列优势。
paper | code

【6】超分性能不降低，计算量降低50%：加速图像超分的ClassSR
本文是在low-level领域关于超分网络加速的一次探索。它创新性的将分类与超分进行了融合，根据不同子块的复原难度自适应选择合适的超分分支以降低整体计算复杂度：复原难度低的平坦区域选择复杂度低的超分分支，复原难度高的纹理区域选择复杂度高的超分分支。在不降低超分性能的情况下，该方法可以最高可以节省50%的计算量。
paper

【5】 MotionRNN：针对复杂时空运动的通用视频预测模型
视频预测方法被广泛应用于降水预报（Precipitation Nowcasting）、交通流预测（Traffic Flow Prediction）、机器人视觉规划（Visual Planning）等众多任务中。然而现实世界的运动极其复杂，且往往处于不断变化中，比如人体运动中的变向、变速、肢体运动，雷达回波中的云团产生、消散、位移、形变等等。这种复杂的时空变化使得准确预测未来的运动极具挑战性。
针对复杂时空运动，我们关注到现实世界的运动在时空上可以分解为整体运动趋势（motion trend）与瞬时变化（transient variation），并基于此提出了名为MotionRNN的模型，对运动趋势与瞬时变化进行统一建模。同时，作为一个通用的视频预测模型，MotionRNN具有很好的灵活性，可以结合众多的基于RNN的时空预测模型，稳定提升它们应对复杂时空运动的能力。
paper

【4】Statistical Texture Learning
从底层细节纹理分析与增强优化视觉学习问题，并在分割任务上得到了验证，直观、合理且有效涨点。我们从传统图像分析领域获得灵感，构建了这样一套Statistical Texture Learning框架，有效的在CNN架构中学习底层纹理（分析+增强），从而获得了非常有效的性能涨点。
paper

【3】二次元妹子五官画风都能改，周博磊团队用无监督方法控制GAN(CVPR2021 Oral)
现在，GAN不仅能画出二次元妹子，还能精准调节五官、表情、姿势和绘画风格。而且在调控某个因素的时候，其他条件能尽量保持不变。SeFa适用于PGGAN、StyleGAN、BigGAN和StyleGAN2等常见GAN模型，不仅对二次元妹子有效，甚至还能调控猫咪上下左右不同方向。
paper | code | Colab

【2】Inception convolution
我们最近被CVPR2021接受的工作，主要使用一些优化手段来找到新的卷积模式，目标是能够找到一个部署友好简单的卷积来帮助下游各个任务更好的提升baseline。
paper | code

【1】RepVGG：极简架构，SOTA性能，让VGG式模型再次伟大（CVPR-2021）
我们最近的工作RepVGG，用结构重参数化（structural re-parameterization）实现VGG式单路极简架构，一路3x3卷到底，在速度和性能上达到SOTA水平，在ImageNet上超过80%正确率。已经被CVPR-2021接收。不用NAS，不用attention，不用各种新颖的激活函数，甚至不用分支结构，只用3x3卷积和ReLU，也能达到SOTA性能。
paper | 开源预训练模型和代码（PyTorch版） | MegEngine版

4. To do list

CVPR2021论文解读
CVPR2021论文分享

Files

CVPR2021.md

Latest commit

History