0% found this document useful (0 votes)

62 views

Retina Net

The document discusses using BiT models as backbones for the RetinaNet object detector and evaluating their performance on the COCO dataset. BiT models pretrained on larger datasets like ImageNet-21k and JFT-300M improve the Average Precision of RetinaNet by up to 1.5 and 2.1 points respectively compared to models pretrained on ImageNet only. RetinaNet is a one-stage object detector that uses Focal Loss to address class imbalance during training, allowing it to match the speed of previous one-stage detectors while surpassing the accuracy of two-stage detectors.

Uploaded by

pra3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views

Retina Net

Uploaded by

pra3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Big Transfer

For object detection, they use the COCO-2017 dataset and train a top-performing object detector,
RetinaNet (simple one-stage object detector), using pre-trained BiT models (BiT-S, BiT-M, BiT-S) as
backbones (share feature extraction layer). Due to memory constraints, they use the ResNet-101x3
architecture for all their BiT models. They fine-tune the detection models on the COCO-2017 train split
and do not use BiT-HyperRule but stick to the standard RetinaNet training protocol. It demonstrates that
BiT models outperform standard ImageNet pretrained models. They can see clear benefits of pre-
training on large data beyond ILSVRC-2012: pretraining on ImageNet-21k results in a 1.5 point
improvement in Average Precision (AP), while pretraining on JFT-300M further improves performance
by 0.6 points. (Refer to 201912.11370.pdf)

RetinaNet, named for its dense sampling of object locations in an input image. Its design features an
efficient in-network feature pyramid and use of anchor boxes. RetinaNet is a single, unified network
composed of a backbone network (in this case BiT models: BiT-S, BiT-M, BiT-S) and two task-specific
subnetworks. The backbone is responsible for computing a convolutional feature map over an entire
input image and is an off-the-self convolutional network. The first subnet performs convolutional object
classification on the backbone’s output; the second subnet performs convolutional bounding box
regression. While there are many possible choices for the details of these components, most design
parameters are not particularly sensitive to exact values as shown in the experiments. RetinaNet can be
trained with stochastic gradient descent (SGD). (Refer to 1708.02002.pdf)

Upstream Pre-Training they use BiT models as backbone: All of their BiT models use a vanilla ResNet-v2
architecture, except that they replace all Batch Normalization layers with Group Normalization and use
Weight Standardization in all convolutional layers. The first component is scale. It is well-known in deep
learning that larger networks perform better on their respective tasks. Further, it is recognized that
larger datasets require larger architectures to realize benefits, and vice versa. They study the
effectiveness of scale (during pre-training) in the context of transfer learning, including transfer to tasks
with very few datapoints. They investigate the interplay between computational budget (training time),
architecture size, and dataset size. For this, they train three BiT models on three large datasets: ILSVRC-
2012 which contains 1.3M images (BiT-S), ImageNet-21k which contains 14M images (BiT-M), and JFT
which contains 300M images (BiT-L). The second component is Group Normalization (GN)and Weight
Standardization (WS). Batch Normalization (BN)is used in most state-of-the-art vision models to stabilize
training. However, they find that BN is detrimental to Big Transfer for two reasons. First, when training
large models with small per-device batches, BN performs poorly or incurs inter-device synchronization
cost. Second, due to the requirement to update running statistics, BN is detrimental for transfer. GN,
when combined with WS, has been shown to improve performance on small-batch training for
ImageNet and COCO. Here, they show that the combination of GN and WS is useful for training with
large batch sizes and has a significant impact on transfer learning.

Transfer to Downstream Fine-Tuning Tasks they use RetinaNet protocol: Train all of their models for 30
epochs using a batch size of 256 with stochastic gradient descent, 0.08 initial learning rate, 0.9
momentum and 10−4 weight decay. They decrease the initial learning rate by a factor of 10 at epochs
number 16 and 22. They did try training for longer (60 epochs) and did not observe performance
improvements. The input image resolution is 1024 × 1024. During training they use a data augmentation
scheme as in [34, refer to 1405.0312.pdf]: random horizontal image flips and scale jittering. They set the
classification loss parameters α to 0.25 and γ to 2.0, see [33, refer to 1708.02002.pdf] for the
explanation of these parameters.

RetinaNet (one-stage detector) was designed to be more efficient than two-stage detectors. One-stage
detectors that are applied over a regular, dense sampling of possible object locations have the potential
to be faster and simpler, but previously have trailed the accuracy of two-stage detectors. It was
discovered that the extreme foreground-background class imbalance encountered during training of
dense detectors is the central cause. To address this class imbalance by reshaping the standard cross
entropy loss such that it down-weights the loss assigned to well-classified examples, Focal Loss method
was used which focuses training on a sparse set of hard examples and prevents the vast number of easy
negatives from overwhelming the detector during training. RetinaNet was used to evaluate the Focal
Loss method. Results show that when trained with the focal loss, RetinaNet is able to match the speed
of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage
detectors. (Refer to 1708.02002.pdf)

Applying the BiT models as backbone to RetinaNet showed clear benefits of one-stage detectors and the
benefits BiT models can be to object detectors by using standard training protocols.

Nanolink 2302 4G Carrier Femtocell User Guide
No ratings yet
Nanolink 2302 4G Carrier Femtocell User Guide
65 pages
The Complete HackPack
50% (2)
The Complete HackPack
18 pages
IMS-DB Case Study
No ratings yet
IMS-DB Case Study
20 pages
The Perfect Team: Physics Chemistry Math Botany Zoology
No ratings yet
The Perfect Team: Physics Chemistry Math Botany Zoology
1 page
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Big Transfer (Bit) : General Visual Representation Learning
No ratings yet
Big Transfer (Bit) : General Visual Representation Learning
28 pages
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Super Res 1
No ratings yet
Super Res 1
9 pages
Digital Image Processing: Fundamentals and Applications
From Everand
Digital Image Processing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Motion Estimation: Advancements and Applications in Computer Vision
From Everand
Motion Estimation: Advancements and Applications in Computer Vision
Fouad Sabry
No ratings yet
CNN Case Studies Unit 4
No ratings yet
CNN Case Studies Unit 4
13 pages
Image Compression: Efficient Techniques for Visual Data Optimization
From Everand
Image Compression: Efficient Techniques for Visual Data Optimization
Fouad Sabry
No ratings yet
2211.09869v4
No ratings yet
2211.09869v4
15 pages
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
From Everand
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
Fouad Sabry
No ratings yet
REF-19-Deep Networks for Image Super-Resolution with Sparse Prior
No ratings yet
REF-19-Deep Networks for Image Super-Resolution with Sparse Prior
10 pages
Computational Geometry: Exploring Geometric Insights for Computer Vision
From Everand
Computational Geometry: Exploring Geometric Insights for Computer Vision
Fouad Sabry
No ratings yet
Image Restoration Using Residual Generative Adversarial Networks-FINAL
No ratings yet
Image Restoration Using Residual Generative Adversarial Networks-FINAL
21 pages
Palette Diffusion
No ratings yet
Palette Diffusion
26 pages
ED6001 Project Report
No ratings yet
ED6001 Project Report
9 pages
Icramet RDGAN Rep3
No ratings yet
Icramet RDGAN Rep3
6 pages
3 Marked PDF
No ratings yet
3 Marked PDF
6 pages
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
DRIVE: Digital Retinal Images For Vessel Extraction
No ratings yet
DRIVE: Digital Retinal Images For Vessel Extraction
5 pages
Age and Gender Classification
No ratings yet
Age and Gender Classification
26 pages
Development of a Wheat Class Detection System Using RetinaNet Architecture A Focus on Imbalanced Data and Focal Loss Optimization
No ratings yet
Development of a Wheat Class Detection System Using RetinaNet Architecture A Focus on Imbalanced Data and Focal Loss Optimization
7 pages
Image-to-Image Difussion Models
No ratings yet
Image-to-Image Difussion Models
29 pages
2402.19215v1
No ratings yet
2402.19215v1
11 pages
Enhanced Super-Resolution Using GAN
No ratings yet
Enhanced Super-Resolution Using GAN
6 pages
Densely Residual Laplacian Super-Resolution: Saeed Anwar, Member, IEEE, and Nick Barnes, Senior Member, IEEE
No ratings yet
Densely Residual Laplacian Super-Resolution: Saeed Anwar, Member, IEEE, and Nick Barnes, Senior Member, IEEE
12 pages
International Journal of Computational Science, Information Technology and Control Engineering (IJCSITCE)
No ratings yet
International Journal of Computational Science, Information Technology and Control Engineering (IJCSITCE)
8 pages
6 - High Resolution Diffusive Model
No ratings yet
6 - High Resolution Diffusive Model
3 pages
Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models For 3D Generation
No ratings yet
Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models For 3D Generation
16 pages
2405.18428v2
No ratings yet
2405.18428v2
18 pages
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
Mobilenet Part2 Ref
No ratings yet
Mobilenet Part2 Ref
1 page
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
2403.01362v1 (1)
No ratings yet
2403.01362v1 (1)
14 pages
GXP - TechnicalPaper - 2021-11 - Deducing High Resolution Satellite Images
No ratings yet
GXP - TechnicalPaper - 2021-11 - Deducing High Resolution Satellite Images
15 pages
RESNET
No ratings yet
RESNET
5 pages
2024 - An Image Is Worth 32 Tokens For Reconstruction and Generation - Yu Et Al
No ratings yet
2024 - An Image Is Worth 32 Tokens For Reconstruction and Generation - Yu Et Al
20 pages
ESRGAN_Slides_3Mar2025
No ratings yet
ESRGAN_Slides_3Mar2025
40 pages
LostNet a Smart Way for Lost and Find
No ratings yet
LostNet a Smart Way for Lost and Find
17 pages
Multiresolution_Mixture_Generative_Adversarial_Network_For_Image_Super-Resolution
No ratings yet
Multiresolution_Mixture_Generative_Adversarial_Network_For_Image_Super-Resolution
6 pages
MADNet A Fast and Lightweight Network For Single-Image Super Resolution
No ratings yet
MADNet A Fast and Lightweight Network For Single-Image Super Resolution
11 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
GRM: Large Gaussian Reconstruction Model For Efficient 3D Reconstruction and Generation
No ratings yet
GRM: Large Gaussian Reconstruction Model For Efficient 3D Reconstruction and Generation
29 pages
Base Paper Batch 9 Final Updated 3
No ratings yet
Base Paper Batch 9 Final Updated 3
10 pages
3 Paper
No ratings yet
3 Paper
14 pages
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Image Super Resolution
No ratings yet
Image Super Resolution
8 pages
All Are Worth Words: A Vit Backbone For Diffusion Models: Long Skip Connection
No ratings yet
All Are Worth Words: A Vit Backbone For Diffusion Models: Long Skip Connection
21 pages
Remotesensing 13 04044 v2 Compressed
No ratings yet
Remotesensing 13 04044 v2 Compressed
12 pages
Residual Squeeze VGG16
No ratings yet
Residual Squeeze VGG16
11 pages
Transfer Learning For Object Detection Using State-of-the-Art Deep Neural Networks
No ratings yet
Transfer Learning For Object Detection Using State-of-the-Art Deep Neural Networks
7 pages
Project Report
No ratings yet
Project Report
13 pages
Deep Learning For Object Identification in ROS-based Mobile Robots
No ratings yet
Deep Learning For Object Identification in ROS-based Mobile Robots
4 pages
Research On GAN-based Image Super-Resolution Method
No ratings yet
Research On GAN-based Image Super-Resolution Method
4 pages
R3GAN
No ratings yet
R3GAN
39 pages
AI resubmtion
No ratings yet
AI resubmtion
18 pages
Merry
No ratings yet
Merry
14 pages
Deep Generative Image Models Using A Laplacian Pyramid of Adversarial Networks
No ratings yet
Deep Generative Image Models Using A Laplacian Pyramid of Adversarial Networks
10 pages
Efficient Deep Models for Real-Time 4K Image Super-Resolution.
No ratings yet
Efficient Deep Models for Real-Time 4K Image Super-Resolution.
27 pages
Synthetic Data Generation For Scarce Road Scene Detection Scenarios
No ratings yet
Synthetic Data Generation For Scarce Road Scene Detection Scenarios
10 pages
Us 20 Yan Security Research On Mercedes Benz From Hardware To Car Control WP
No ratings yet
Us 20 Yan Security Research On Mercedes Benz From Hardware To Car Control WP
38 pages
Advanced Keyboard Shortcuts
No ratings yet
Advanced Keyboard Shortcuts
17 pages
Paper 11-Critical Path Reduction of Distributed Arithmetic Based FIR Filter
No ratings yet
Paper 11-Critical Path Reduction of Distributed Arithmetic Based FIR Filter
8 pages
NeHe Productions - 2D Texture Font
No ratings yet
NeHe Productions - 2D Texture Font
10 pages
Unit 4 - PHP 0 Mysql Notes
No ratings yet
Unit 4 - PHP 0 Mysql Notes
62 pages
Tla5Kup Logic Analyzer Field Upgrade Kit Instructions: Warning
No ratings yet
Tla5Kup Logic Analyzer Field Upgrade Kit Instructions: Warning
47 pages
F
No ratings yet
F
104 pages
Java To Kotlin PDF
No ratings yet
Java To Kotlin PDF
19 pages
Test Process
No ratings yet
Test Process
12 pages
Pre Project Analysis
No ratings yet
Pre Project Analysis
7 pages
En Acs800 Motioncontrol FM Rev A
No ratings yet
En Acs800 Motioncontrol FM Rev A
234 pages
Specs IBFlex - SLV - 1.3.2.0
No ratings yet
Specs IBFlex - SLV - 1.3.2.0
6 pages
Azure Role Based Access Control
No ratings yet
Azure Role Based Access Control
2,558 pages
Mohamed Bilal Nodejs
No ratings yet
Mohamed Bilal Nodejs
3 pages
Top Artificial Intelligence (AI) Use Cases: CRM/Service Delivery Optimization
No ratings yet
Top Artificial Intelligence (AI) Use Cases: CRM/Service Delivery Optimization
1 page
Rails For Zombies Slides
No ratings yet
Rails For Zombies Slides
127 pages
LR Material
No ratings yet
LR Material
92 pages
Online Algorithm Wikipedia
No ratings yet
Online Algorithm Wikipedia
3 pages
Coursera XN2TMUC5M52P PDF
No ratings yet
Coursera XN2TMUC5M52P PDF
1 page
A Complete Guide To Data Augmentation - DataCamp
No ratings yet
A Complete Guide To Data Augmentation - DataCamp
18 pages
EEE 105: Instruction Set Examples: Snap Densing
No ratings yet
EEE 105: Instruction Set Examples: Snap Densing
3 pages
Procedure of Computer Maintenance
No ratings yet
Procedure of Computer Maintenance
47 pages
PIC24FJXXXGA1GB1 Families Flash Programming Specification
No ratings yet
PIC24FJXXXGA1GB1 Families Flash Programming Specification
52 pages
The Role of Artificial Intelligence in Cyber Security
No ratings yet
The Role of Artificial Intelligence in Cyber Security
24 pages
Pa 3400 Series
No ratings yet
Pa 3400 Series
7 pages
DJS22-Third Year final Syllabus changed after BOS
No ratings yet
DJS22-Third Year final Syllabus changed after BOS
64 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Retina Net

Uploaded by

Retina Net

Uploaded by

Big Transfer

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.