Benchmarking deep learning workloads with tensorflow on the NVIDIA GeForce RTX 3090

SECTIONS

NVIDIA recently released the much-anticipated GeForce RTX 30 Series of Graphics cards, with the largest and most powerful, the RTX 3090, boasting 24GB of memory and 10,500 CUDA cores. This is the natural upgrade to 2018’s 24GB RTX Titan and we were eager to benchmark the training performance performance of the latest GPU against the Titan with modern deep learning workloads.

Based on the specs alone, the 3090 RTX offers a great improvement in the number of CUDA cores, which should give us a nice speed up on FP32 tasks. However, NVIDIA decided to cut the number of tensor cores in GA102 (compared to GA100 found in A100 cards) which might impact FP16 performance.

‍

	Titan RTX	3090 RTX
Architecture	Turing TU102	Ampere GA102
Cuda cores	4,609	10,496
Tensor cores	576	328
Memory	24GB	24GB
Memory bandwidth	672 GB/sec	936 GB/sec
TDP (watts)	285	350

System:

Ubuntu 18.04.3

Driver Version: 455.23.05

CUDA Version: 11.1

Tensorflow: tf-nightly 2.4.0.dev20200928

It is very important to use the latest version of CUDA (11.1) and latest tensorflow, some features like TensorFloat are not yet available in a stable release at the time of writing.

We use our own fork of the Lambda Tensorflow Benchmark which measures the training performance for several deep learning models trained on ImageNet.

‍

Training performance in images processed per second
	FP16		FP32
	Titan RTX	RTX 3090	Titan RTX	RTX 3090
AlexNet	6634.31	8255.43	4448.46	6493.16
Inception3	656.13	616.25	222.95	337.31
Inception4	298.11	132.73	99.74	143.65
ResNet152	423.92	484.02	134.47	203.58
ResNet150	966.77	1259.95	335.96	525.88
VGG16	339.73	442.49	212.06	325.60

‍

Speedup of RTX 3090 over Titan RTX

‍

We're able to achieve a 1.4-1.6x training speed-up for all the models training with FP32! As expected, the FP16 is not quite as significant, with a 1.0-1.2x speed-up for most models and a drop for Inception.

‍

Please get in touch at hello@evolution.ai with any questions or comments!

FEATURED

Three Underrated Financial Ratios & How to Use AI to Calculate Them Faster

What to do if ChatGPT Experiences an Error Reading Documents

Navigating Intelligent Document Processing (IDP) Tools: A Cheatsheet

Five Tips for Introducing AI-Led Automation into the Workplace

What Is Intelligent Data Extraction?

How to Automate Manual Data Entry with Generative AI

Book a demo

Benchmarking deep learning workloads with tensorflow on the NVIDIA GeForce RTX 3090

SECTIONS

FEATURED