SYSTEM Cited by 1 source

NVIDIA RTX PRO 6000 Blackwell¶

Definition¶

The NVIDIA RTX PRO 6000 Blackwell is an NVIDIA Blackwell- generation GPU positioned for GPU-memory-intensive generative AI workloads, with 96 GB of GPU memory. The wiki sees it referenced as the GPU under the Amazon EC2 G7e instance family, where AWS positions G7e (and therefore the RTX PRO 6000 Blackwell) as a cost-efficient option for serving GPU-memory-intensive generative AI video models like latent-diffusion video pipelines.

Stub page — extend as further sources cite additional architecture details (FLOPs profile, HBM vs GDDR, NVLink presence, Tensor Core generation specifics, FP4/FP8 support, memory bandwidth).

Why this GPU is the wiki niche¶

96 GB VRAM is the load-bearing property — large enough to hold a 14B-parameter latent-diffusion video model plus its activation + chunk-buffer footprint without sharding the model.
Blackwell architecture lineage — same generation as the datacenter-class GB200 Grace Blackwell Superchip but in a workstation-class / server-rack package suited for inference, not frontier training. RTX PRO 6000 Blackwell is the inference-tier Blackwell part on AWS.
Cost-efficient for VRAM-bound inference vs the H100 / B200 training-grade tier.

Wiki-attested workload¶

companies/synthesia latent-diffusion video generation — in-house models hosted on G7e for the 96 GB VRAM headroom.
Wan 2.2 14B Hugging Face Diffusers public-benchmark VAE decoder — 41-latent-frame test video, 10 consecutive decode runs on g7e.2xlarge.
GPU kernel utilisation 82% (synchronous baseline) → 99.9% (asynchronous frame-generation pipeline).

Software primitives that materially affect GPU utilisation¶

The Synthesia / AWS post is explicit that the RTX PRO 6000 Blackwell's separate compute and copy engines are a load-bearing hardware property — they're what allow patterns/dual-cuda-stream-compute-and-copy-overlap to overlap compute kernels with D2H transfers physically. Without this separation, dual CUDA streams would not yield real overlap.

Additional primitives required to actually realise that overlap:

concepts/cuda-stream — Compute Stream + Copy Stream.
concepts/pinned-memory — page-locked host RAM as the D2H destination.
CUDA events as cross-stream barriers.

Seen in¶

sources/2026-05-19-aws-how-synthesia-optimizes-generative-ai-video-inference-on-amazon-ec2-g7e-instances — first wiki appearance. "NVIDIA RTX PRO 6000 Blackwell GPUs, with 96 GB of GPU memory" under the G7e instance family. Wiki-attested ratio of compute-to-copy engines is implicit in the dual-CUDA-stream optimisation working at all (82% → 99.9% kernel utilisation).

systems/aws-ec2-g7e — EC2 instance family that hosts this GPU.
systems/nvidia-gb200-grace-blackwell — datacenter-Blackwell sibling (frontier training tier).
systems/nvidia-h100 — prior-gen Hopper datacenter GPU (80 GB) — adjacent in the AWS GPU ladder.
systems/nvidia-l40s — prior-gen Ada Lovelace inference GPU (48 GB).
systems/nvidia-tensor-core — present-gen tensor-core hardware family.
concepts/gpu-kernel-utilization — saturation metric on this GPU.
patterns/dual-cuda-stream-compute-and-copy-overlap — pattern whose efficacy depends on this GPU's separate compute and copy engines.