NVIDIA Tesla P100 Accelerator For PCI Express Based Platforms Announced - Comes in 16 GB and 12 GB HBM2 Variants, 250W TDP

NVIDIA has but appear that they will be launching a PCI Express based version of their Tesla P100 GPU accelerator which is designed for hyper scale computing. The Tesla P100 which utilizes the GP100 GPU was initially announced back at GTC 2022 as the first graphics lath to utilize HBM2 standard and NVLINK inter connect from NVIDIA. Today, NVIDIA is introducing 2 new products to their Tesla P100 family.

The NVIDIA Tesla P100 is the well-nigh avant-garde hyper scale graphics accelerator congenital to date.

NVIDIA Tesla P100 To Be Available in PCI-Limited Form Gene - 12 GB and 16 GB HBM2 Variants Announced

Based on the GP100 GPU, the Tesla P100 is NVIDIA's well-nigh advanced and most powerful GPU ever designed for HPC and Datacenter platforms. These GPUs are designed to supercharge HPC applications by more than than 30X compared to current generation solutions. The new PCI-Express solutions are designed for the datacenter and HPC marketplace to make them compatible with electric current GPU accelerated servers as the previous Tesla P100 used a mezzanine connector which required the utilization of new servers. Both cards are optimized to power the most computationally intensive AI and HPC data heart applications.

The NVIDIA Tesla P100 GPU is now bachelor in PCI-Express form gene with multiple TFLOPs of dual precision.

NVIDIA Tesla P100 (GP100 GPU) Benchmarks

"Accelerated computing is the simply path forrard to go on upward with researchers' clamorous demand for HPC and AI supercomputing," said Ian Buck, vice president of accelerated calculating at NVIDIA. "Deploying CPU-only systems to run across this demand would crave large numbers of commodity compute nodes, leading to substantially increased costs without proportional operation gains. Dramatically scaling performance with fewer, more powerful Tesla P100-powered nodes puts more dollars into computing instead of vast infrastructure overhead." via NVIDIA

NVIDIA Tesla P100 Specifications in item - PCI-East and NVLINK Variants in Comparison

NVIDIA's Tesla P100 is the well-nigh fastest supercomputing scrap in the world. It is based on an entirely new, fifth Generation CUDA compages codenamed Pascal. The GP100 GPU which utilizes the Pascal architecture is at the heart of the Tesla P100 accelerator. NVIDIA has spend the last several years in the evolution of the new GPU and it volition finally exist shipping in June 2022 to supercomputers.

The Tesla P100 comes with beefy specs. Starting off, we have a 16nm Pascal chip that measures in at 610mm2, features xv.iii Billion transistors and comes with 3584 CUDA cores. The full Pascal GP100 chip features upwardly 3840 CUDA Cores. NVIDIA has redesigned their SMs (Streaming Multiprocessor) units and rearranged them to support 64 CUDA cores per SM block. The Tesla P100 has 56 of these blocks enabled while the full GP100 has sixty blocks in total. The scrap comes with defended set of FP64 CUDA Cores. There are 32 FP64 cores per cake and the whole GPU has 1792 dedicated FP64 cores.

The 16nm FinFET compages allows maximum throughput of performance and clock rate. In the instance of Tesla P100 solution that has been optimized for NVLINK capable servers, nosotros are looking at 5.iii TFLOPs of double precision, 10.vi TFLOPs of single precision and 21.2 TFLOPs of one-half precision compute performance. The NVLINK variants come with 16 GB of HBM2 VRAM that delivers upward to 720 GB/southward bandwidth while NVLINK interconnect adds 60 GB/due south bandwidth in addition to the 32 GB/s from the PCI-Express interconnect.

The PCI-Express optimized variants are optimized for lower clocks. These cards have TDP set up to 250W so we are looking at slightly lower clock speeds than the NVLINK optimized variant. Both cards deliver 4.7 TFLOPs double, 9.3 TFLOPs single and 18.vii TFLOPs mixed precision compute performance. These xvi GB variant comes with full bandwidth of 720 GB/s while the 12 GB HBM2 variant comes with 540 GB/s bandwidth. The cards will utilise the PCI-Limited interconnect (32 GB/south) for simultaneous connectedness betwixt multiple GPUs.

The Tesla P100 has three variants, 2 PCI-Express optimized and a single NVLINK optimized.

"Tesla P100 accelerators deliver new levels of functioning and efficiency to accost some of the virtually of import computational challenges of our time," said Thomas Schulthess, professor of computational physics at ETH Zurich and director of the Swiss National Supercomputing Middle. "The upgrade of four,500 GPU-accelerated nodes on Piz Daint to Tesla P100 GPUs will more than double the system's performance, enabling researchers to attain breakthroughs in a range of fields, including cosmology, materials science, seismology and climatology." via NVIDIA

NVIDIA Volta Tesla V100S Specs:

NVIDIA Tesla Graphics Menu	Tesla K40 (PCI-Express)	Tesla M40 (PCI-Express)	Tesla P100 (PCI-Express)	Tesla P100 (SXM2)	Tesla V100 (PCI-Express)	Tesla V100 (SXM2)	Tesla V100S (PCIe)
GPU	GK110 (Kepler)	GM200 (Maxwell)	GP100 (Pascal)	GP100 (Pascal)	GV100 (Volta)	GV100 (Volta)	GV100 (Volta)
Process Node	28nm	28nm	16nm	16nm	12nm	12nm	12nm
Transistors	7.1 Billion	eight Billion	15.3 Billion	15.three Billion	21.1 Billion	21.1 Billion	21.i Billion
GPU Die Size	551 mm2	601 mm2	610 mm2	610 mm2	815mm2	815mm2	815mm2
SMs	xv	24	56	56	80	80	lxxx
TPCs	15	24	28	28	xl	40	40
CUDA Cores Per SM	192	128	64	64	64	64	64
CUDA Cores (Full)	2880	3072	3584	3584	5120	5120	5120
Texture Units	240	192	224	224	320	320	320
FP64 CUDA Cores / SM	64	4	32	32	32	32	32
FP64 CUDA Cores / GPU	960	96	1792	1792	2560	2560	2560
Base Clock	745 MHz	948 MHz	1190 MHz	1328 MHz	1230 MHz	1297 MHz	TBD
Boost Clock	875 MHz	1114 MHz	1329MHz	1480 MHz	1380 MHz	1530 MHz	1601 MHz
FP16 Compute	North/A	N/A	18.7 TFLOPs	21.2 TFLOPs	28.0 TFLOPs	thirty.four TFLOPs	32.8 TFLOPs
FP32 Compute	5.04 TFLOPs	6.8 TFLOPs	10.0 TFLOPs	10.vi TFLOPs	14.0 TFLOPs	15.7 TFLOPs	16.4 TFLOPs
FP64 Compute	i.68 TFLOPs	0.2 TFLOPs	4.7 TFLOPs	5.30 TFLOPs	7.0 TFLOPs	seven.lxxx TFLOPs	eight.2 TFLOPs
Retentiveness Interface	384-bit GDDR5	384-bit GDDR5	4096-scrap HBM2	4096-bit HBM2	4096-scrap HBM2	4096-bit HBM2	4096-bit HBM
Memory Size	12 GB GDDR5 @ 288 GB/s	24 GB GDDR5 @ 288 GB/s	sixteen GB HBM2 @ 732 GB/s 12 GB HBM2 @ 549 GB/s	xvi GB HBM2 @ 732 GB/s	16 GB HBM2 @ 900 GB/s	16 GB HBM2 @ 900 GB/s	16 GB HBM2 @ 1134 GB/s
L2 Cache Size	1536 KB	3072 KB	4096 KB	4096 KB	6144 KB	6144 KB	6144 KB
TDP	235W	250W	250W	300W	250W	300W	250W

NVIDIA Tesla P100 PCI-Limited Features:

Unmatched application performance for mixed-HPC workloads -- Delivering iv.7 teraflops and 9.3 teraflops of double-precision and single-precision pinnacle functioning, respectively, a single Pascal-based Tesla P100 node provides the equivalent performance of more than than 32 commodity CPU-just servers.
CoWoS with HBM2 for unprecedented efficiency -- The Tesla P100 unifies processor and data into a single bundle to deliver unprecedented compute efficiency. An innovative arroyo to memory design -- chip on wafer on substrate (CoWoS) with HBM2 -- provides a 3x boost in memory bandwidth performance, or 720GB/sec, compared to the NVIDIA Maxwell™ architecture.
Folio Migration Engine for simplified parallel programming -- Frees developers to focus on tuning for higher operation and less on managing information movement, and allows applications to scale across the GPU concrete retentiveness size with support for virtual retentivity paging. Unified memory technology dramatically improves productivity by enabling developers to see a single memory infinite for the unabridged node.
Unmatched application support -- With 410 GPU-accelerated applications, including nine of the height ten HPC applications, the Tesla platform is the world's leading HPC computing platform.

Tesla P100 for PCIe Specifications:

four.7 teraflops double-precision performance, 9.3 teraflops unmarried-precision performance and eighteen.seven teraflops half-precision performance with NVIDIA GPU BOOST™ technology
Support for PCIe Gen three interconnect (32GB/sec bi-directional bandwidth)
Enhanced programmability with Page Migration Engine and unified memory
ECC protection for increased reliability
Server-optimized for highest data center throughput and reliability
Bachelor in ii configurations:
- 16GB of CoWoS HBM2 stacked memory, delivering 720GB/sec of retention bandwidth
- 12GB of CoWoS HBM2 stacked retentivity, delivering 540GB/sec of memory bandwidth

16GB of CoWoS HBM2 stacked retention, delivering 720GB/sec of retention bandwidth
12GB of CoWoS HBM2 stacked memory, delivering 540GB/sec of memory bandwidth

NVIDIA'south GP100 based Tesla P100 lath is already shipping to the latest supercomputers that apply NVLINK technology. The graphics board would also be bachelor with NVIDIA's DGX-1 supercomputer rack later in June. The PCI-Limited based products are expected to be available in Q4 2022 from NVIDIA partners and server makers including Cray, Dell, Hewlett Packard Enterprise, IBM and SGI. The NVLINK board will be available in Q1 2022 through NVIDIA partners.