GPU Glossary
Technical terminology and definitions for the modern GPU landscape.
C
D
F
FP16 Performance
Half-precision floating-point performance, offering faster computation with reduced precision compared to FP32.
FP32 Performance
Single-precision floating-point performance, measuring how many 32-bit floating-point operations the GPU can perform per second.
FP8 Precision
A low-precision floating-point format used to improve performance and efficiency in AI and machine learning tasks.
I
M
Memory Bandwidth
The rate at which data can be transferred between GPU memory and processing cores, measured in GB/s.
Memory Bus Width
The number of bits that can be transferred simultaneously between the GPU and its memory, measured in bits.
Multi-Instance GPU
A technology that allows a single GPU to be partitioned into multiple instances for diverse workloads.
N
R
S
T
TDP
Thermal Design Power - The maximum amount of heat a GPU is designed to dissipate under typical workloads, measured in watts.
Tensor Cores
Specialized processing units optimized for matrix multiplication operations used in AI and deep learning.
TF32 Precision
A precision format designed to accelerate AI workloads by providing a balance between FP16 and FP32.
Transformer Engine
A specialized engine in modern GPUs designed to accelerate Transformer model training and inference.
V
Virtual GPU
A technology that allows a single physical GPU to be shared among multiple virtual machines, improving resource utilization.
VRAM
Video RAM - High-speed memory dedicated to storing textures, frame buffers, and computational data for the GPU.
Vulkan API
A cross-platform graphics and compute API designed to provide high-efficiency access to modern GPUs.