Book a demo

For full terms & conditions, please read our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
White plus
Blog Home

Benchmarking deep learning workloads with tensorflow on the NVIDIA GeForce RTX 3090

Rafal Kwasny, Daniel Friar, Giuseppe Papallo
September 29, 2020
SECTIONS

NVIDIA recently released the much-anticipated GeForce RTX 30 Series of Graphics cards, with the largest and most powerful, the RTX 3090, boasting 24GB of memory and 10,500 CUDA cores. This is the natural upgrade to 2018’s 24GB RTX Titan and we were eager to benchmark the training performance performance of the latest GPU against the Titan with modern deep learning workloads.


Based on the specs alone, the 3090 RTX offers a great improvement in the number of CUDA cores, which should give us a nice speed up on FP32 tasks. However, NVIDIA decided to cut the number of tensor cores in GA102 (compared to GA100 found in A100 cards) which might impact FP16 performance.


Titan RTX 3090 RTX
Architecture Turing TU102 Ampere GA102
Cuda cores 4,609 10,496
Tensor cores 576 328
Memory 24GB 24GB
Memory bandwidth 672 GB/sec 936 GB/sec
TDP (watts) 285 350


System:

Ubuntu 18.04.3

Driver Version: 455.23.05

CUDA Version: 11.1

Tensorflow: tf-nightly 2.4.0.dev20200928


It is very important to use the latest version of CUDA (11.1) and latest tensorflow, some features like TensorFloat are not yet available in a stable release at the time of writing.


We use our own fork of the Lambda Tensorflow Benchmark which measures the training performance for several deep learning models trained on ImageNet.

Training performance in images processed per second
FP16 FP32
Titan RTX RTX 3090 Titan RTX RTX 3090
AlexNet 6634.31 8255.43 4448.46 6493.16
Inception3 656.13 616.25 222.95 337.31
Inception4 298.11 132.73 99.74 143.65
ResNet152 423.92 484.02 134.47 203.58
ResNet150 966.77 1259.95 335.96 525.88
VGG16 339.73 442.49 212.06 325.60

Speedup of RTX 3090 over Titan RTX

We're able to achieve a 1.4-1.6x training speed-up for all the models training with FP32! As expected, the FP16 is not quite as significant, with a 1.0-1.2x speed-up for most models and a drop for Inception.

Please get in touch at hello@evolution.ai with any questions or comments!

Share to LinkedIn