Tensor Cores

Architecture

DEFINITION

Specialized processing units optimized for matrix multiplication operations used in AI and deep learning.

OVERVIEW

Tensor Cores are specialized hardware units designed specifically for accelerating matrix multiplication and convolution operations, which are fundamental to deep learning and AI workloads.

TECHNICAL DETAILS

Introduced with NVIDIA's Volta architecture, Tensor Cores can perform mixed-precision matrix multiply-accumulate operations significantly faster than CUDA cores. Each Tensor Core can perform 64 FP16 operations per clock cycle. Modern generations support multiple precision formats including FP16, BF16, TF32, FP8, and INT8, enabling different speed-accuracy tradeoffs.

COMMON USE CASES

  • Training large language models (LLMs)
  • Deep learning inference at scale
  • Computer vision and image recognition
  • Natural language processing
  • Recommendation systems