AWS EC2 G7 with NVIDIA Blackwell GPU: 4.6x AI inference

TL;DR: AWS introduces EC2 G7 instances with NVIDIA Blackwell GPU, offering up to 4.6x AI inference and 2.1x graphics performance over G6. They feature up to 8 GPUs, 256 GB GPU memory, and 700 Gbps networking.

What happened?

AWS has launched Amazon EC2 G7 instances, accelerated by the new NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs. According to the AWS News Blog, AWS is the first major cloud provider to offer this generation of GPUs. The instances are designed for AI inference, graphics, data analytics, and virtualization workloads. Compared to G6, they offer up to 4.6x in AI inference and 2.1x in graphics performance, according to AWS. This leap is significant: the previous generation (G6) used NVIDIA L40S GPUs, while the new RTX PRO 4500 incorporates fifth-generation Tensor Cores and fourth-generation RT Cores, representing a major architectural upgrade. Additionally, AWS states that G7 offers 1.5x the performance in concurrent video streams thanks to ninth-generation NVENC encoders.

Key specifications

The G7 are available in 7 sizes, with up to 8 GPUs (32 GB memory each, total 256 GB), 192 vCPUs from sixth-generation Intel Xeon Scalable processors, 768 GiB of system memory, 7.6 TB of local NVMe storage, and 700 Gbps of network bandwidth with EFA. GPU memory is 1.33 times larger and bandwidth 2.45 times higher than G6. EFA network bandwidth is 7 times higher than G6, enabling low-latency connectivity crucial for AI inference and intensive graphics applications. Local NVMe storage allows keeping large models and datasets close to compute, reducing data transfer overhead. In comparison, G6 instances offered up to 8 L40S GPUs with 48 GB memory each, but with lower memory and network bandwidth. G7 also includes 4:2:2 encoding with ninth-generation NVENC, benefiting professional video workflows.

Why is it important?

AI inference is increasingly critical as models are deployed in production. G7 allows processing more requests with lower latency, reducing operational costs. For example, a large language model (LLM) that previously required multiple G6 instances can now run on a single G7, lowering cost per inference. Additionally, fifth-generation Tensor Cores and fourth-generation RT Cores improve rendering and virtual reality. The inclusion of 4:2:2 encoding with ninth-generation NVENC benefits professional video workflows such as live streaming and post-production. For enterprises using Amazon EMR on EKS, G7 offers accelerated performance for GPU data analytics, enabling faster query processing. Historically, AWS has been updating its GPU instances from G2 (2013) to G4 (2019) and G5 (2021), each with significant improvements. G7 represents a generational leap, especially in AI inference, where competition with Azure (offering ND A100 v4 instances) and GCP (with A2) intensifies.

Market implications

AWS reinforces its leadership in GPU-as-a-service, competing with Azure and GCP. G7 can accelerate AI adoption in enterprises needing performance without compromising cloud flexibility. For AI startups, it lowers the barrier to entry for cutting-edge hardware. However, pricing has not been detailed; it is expected to be higher than G6. Historically, AWS GPU instances have had premium pricing, but the additional performance may justify the cost for latency-sensitive workloads. Additionally, G7 availability in select regions (us-east-1, us-west-2, eu-west-1, ap-southeast-1) suggests AWS prioritizes markets with high AI demand. For users, this means they can migrate existing workloads to G7 for better performance without switching providers. However, the lack of transparent pricing may be a barrier for small businesses. Compared to past events like the launch of P3 instances (2017) that popularized AI training in the cloud, G7 focuses on inference, a growing market. Other providers like Azure and GCP are expected to follow with similar announcements, intensifying competition.

What readers should know

G7 are available now in select regions (us-east-1, us-west-2, eu-west-1, ap-southeast-1). They are recommended for large model inference, 3D rendering, video transcoding, and analytics with Amazon EMR on EKS. For training workloads, AWS continues to offer P5 instances with H100. G7 are ideal for workloads requiring low latency and high graphics processing power. Users should note that pricing has not been announced, but it is expected to be higher than G6. Additionally, G7 uses sixth-generation Intel Xeon Scalable processors, offering improvements in energy efficiency and security. For those already using G6, migration to G7 may require driver and network configuration adjustments, but AWS provides compatibility documentation. In terms of future availability, AWS plans to gradually expand regions. Finally, for workloads that do not require maximum performance, G6 remains a cost-effective option.

AWS launches G7 EC2 instances with NVIDIA Blackwell GPU: 4.6x AI inference

What happened?

Key specifications

Why is it important?

Market implications

What readers should know

Keep reading