Amazon Unveils Graviton4 SoC with 96 Arm Neoverse V2 Cores and Trainium2 AI Chips

Published 2023-11-29 08:37 by Hilbert Hagedoorn

Amazon Web Services (AWS) is introducing two new data center chips, the Graviton4 processor and the Trainium2 AI chip. Graviton4 boasts 96 Arm Neoverse V2 cores, a significant increase from its predecessor, delivering improved performance.

According to Amazon, Graviton4 offers up to a 30% performance boost compared to the previous Graviton3 chips. The core count has been increased by 50%, reaching a total of 96 cores. These chips continue to utilize the Arm architecture, this time incorporating Neoverse V2 cores. Notably, the memory bandwidth has also received a substantial boost, increasing by 70%. Graviton4 features twelve DDR5 channels with support for speeds of up to 5600 megatransfers per second. Amazon has not disclosed the manufacturing process for these chips, though the prior Graviton3 chips were produced on TSMC's 5nm node.

Graviton4 provides up to 30% better compute performance, 50% more cores, and 75% more memory bandwidth than current generation Graviton3 processors, delivering the best price performance and energy efficiency for a broad range of workloads running on Amazon EC2.
Trainium2 is designed to deliver up to 4x faster training than first generation Trainium chips and will be able to be deployed in EC2 UltraClusters of up to 100,000 chips, making it possible to train foundation models (FMs) and large language models (LLMs) in a fraction of the time, while improving energy efficiency up to 2x.

“Silicon underpins every customer workload, making it a critical area of innovation for AWS,” said David Brown, vice president of Compute and Networking at AWS. “By focusing our chip designs on real workloads that matter to customers, we’re able to deliver the most advanced cloud infrastructure to them. Graviton4 marks the fourth generation we’ve delivered in just five years, and is the most powerful and energy efficient chip we have ever built for a broad range of workloads. And with the surge of interest in generative AI, Tranium2 will help customers train their ML models faster, at a lower cost, and with better energy efficiency.”

Graviton4 raises the bar on price performance and energy efficiency for a broad range of workloads

Today, AWS offers more than 150 different Graviton-powered Amazon EC2 instance types globally at scale, has built more than 2 million Graviton processors, and has more than 50,000 customers—including the top 100 EC2 customers—using Graviton-based instances to achieve the best price performance for their applications. Customers including Datadog, DirecTV, Discovery, Formula 1 (F1), NextRoll, Nielsen, Pinterest, SAP, Snowflake, Sprinklr, Stripe, and Zendesk use Graviton-based instances to run a broad range of workloads, such as databases, analytics, web servers, batch processing, ad serving, application servers, and microservices. As customers bring larger in-memory databases and analytics workloads to the cloud, their compute, memory, storage, and networking requirements increase. As a result, they need even higher performance and larger instance sizes to run these demanding workloads, while managing costs. Furthermore, customers want more energy-efficient compute options for their workloads to reduce their impact on the environment. Graviton is supported by many AWS managed services, including Amazon Aurora, Amazon ElastiCache, Amazon EMR, Amazon MemoryDB, Amazon OpenSearch, Amazon Relational Database Service (Amazon RDS), AWS Fargate, and AWS Lambda, bringing Graviton’s price performance benefits to users of those services.

Graviton4 processors deliver up to 30% better compute performance, 50% more cores, and 75% more memory bandwidth than Graviton3. Graviton4 also raises the bar on security by fully encrypting all high-speed physical hardware interfaces. Graviton4 will be available in memory-optimized Amazon EC2 R8g instances, enabling customers to improve the execution of their high-performance databases, in-memory caches, and big data analytics workloads. R8g instances offer larger instance sizes with up to 3x more vCPUs and 3x more memory than current generation R7g instances. This allows customers to process larger amounts of data, scale their workloads, improve time-to-results, and lower their total cost of ownership. Graviton4-powered R8g instances are available today in preview, with general availability planned in the coming months. To learn more about Graviton4-based R8g instances, visit aws.amazon.com/ec2/instance-types/r8g.

EC2 UltraClusters of Trainum2 are designed to deliver the highest performance, most energy efficient AI model training infrastructure in the cloud

The FMs and LLMs behind today’s emerging generative AI applications are trained on massive datasets. These models make it possible for customers to completely reimagine user experiences through the creation of a variety of new content, including text, audio, images, video, and even software code. The most advanced FMs and LLMs today range from hundreds of billions to trillions of parameters, requiring reliable high-performance compute capacity capable of scaling across tens of thousands of ML chips. AWS already provides the broadest and deepest choice of Amazon EC2 instances featuring ML chips, including the latest NVIDIA GPUs, Trainium, and Inferentia2. Today, customers including Databricks, Helixon, Money Forward, and the Amazon Search team use Trainium to train large-scale deep learning models, taking advantage of Trainium’s high performance, scale, reliability, and low cost. But even with the fastest accelerated instances available today, customers want more performance and scale to train these increasingly sophisticated models faster, at a lower cost, while simultaneously reducing the amount of energy they use.

Trainium2 chips are purpose-built for high performance training of FMs and LLMs with up to trillions of parameters. Trainium2 is designed to deliver up to 4x faster training performance and 3x more memory capacity compared to first generation Trainium chips, while improving energy efficiency (performance/watt) up to 2x. Trainium2 will be available in Amazon EC2 Trn2 instances, containing 16 Trainium chips in a single instance. Trn2 instances are intended to enable customers to scale up to 100,000 Trainium2 chips in next generation EC2 UltraClusters, interconnected with AWS Elastic Fabric Adapter (EFA) petabit-scale networking, delivering up to 65 exaflops of compute and giving customers on-demand access to supercomputer-class performance. With this level of scale, customers can train a 300-billion parameter LLM in weeks versus months. By delivering the highest scale-out ML training performance at significantly lower costs, Trn2 instances can help customers unlock and accelerate the next wave of advances in generative AI.

Share this content