Intel and AMD ACE: New AI Extensions for x86

TL;DR: Intel and AMD present ACE, a set of x86 instructions to accelerate matrix operations in AI. Promises greater efficiency and less reliance on GPUs. Coming in future processors.

What happened?

Intel and AMD, in an unprecedented collaboration, have introduced the Advanced Compute Extensions (ACE), a new set of x86 instructions aimed at artificial intelligence. According to Tom's Hardware, ACE introduces a new family of instructions that enable more efficient matrix multiplications in terms of power and area density. This means x86 CPUs will be able to handle AI workloads, such as small model inference or preprocessing, without relying solely on GPUs or dedicated accelerators. The ACE specification focuses on mixed-precision matrix operations (INT8, BF16, FP16), delivering up to 2-3 times better performance in lightweight inference tasks compared to AVX-512, according to initial estimates. The collaboration between the two historically rival giants underscores the urgency of standardizing AI acceleration in the x86 ecosystem.

Why is it important?

Historically, AI acceleration in CPUs has been limited to extensions like AVX-512 (Intel) or AMX (Intel, with limited AMD support). However, these were not unified: AVX-512 is not compatible with AMD (except in some recent models with partial implementation), and AMX is only present in Intel Sapphire Rapids and later. ACE standardizes a common set of instructions for Intel and AMD, simplifying software development and optimizing performance across a wide range of devices, from laptops to data centers. Moreover, by focusing on matrix operations (matmul), ACE directly addresses the core of deep learning workloads, offering performance improvements without additional hardware. For developers, this means they can write optimized code that runs predictably on both platforms, reducing software fragmentation. According to Tom's Hardware, ACE also introduces instructions for data format conversion and reduction operations, facilitating integration with frameworks like TensorFlow and PyTorch.

Market implications

Cost reduction: Companies will be able to run AI inference on existing CPUs, saving on GPUs, whose cost and availability have been recurring issues. For example, in edge computing servers, where GPUs are expensive and power-hungry, ACE will allow processing lightweight AI models directly on the CPU.
Increased competition: ARM and RISC-V are also developing similar extensions (SVE/SVE2 and vector extensions, respectively), but the installed base of x86 ensures immediate impact. However, if ACE does not achieve rapid adoption, ARM could capture market share in the lightweight AI server segment.
Energy efficiency: ACE promises to reduce power consumption in AI workloads, crucial for data centers and edge devices. According to Intel estimates, ACE instructions can achieve up to 1.5 times better energy efficiency than AMX in INT8 inference tasks.
Impact on startups: Startups developing specialized AI hardware (like Groq or Cerebras) may face pressure, as x86 CPUs with ACE offer a more accessible alternative already integrated into existing systems.

What should readers know?

ACE extensions will arrive in future Intel processors (likely Arrow Lake and successors) and AMD (Zen 5 and later). They will not be backward compatible with older CPUs, so developers will need to compile specific code for ACE. The software ecosystem (compilers, frameworks like TensorFlow and PyTorch) is expected to adopt ACE quickly, as both Intel and AMD will contribute to LLVM and GCC. For developers, ACE simplifies AI code optimization on x86, as there is now a common instruction set that avoids vendor-specific branching. End users will notice improvements in everyday applications that use AI, such as voice assistants, image recognition, or photo editing, especially on laptops without dedicated GPUs. However, for large model training tasks, GPUs and accelerators will still be necessary.

"ACE marks the beginning of an era where x86 CPUs can compete with GPUs in efficiency for certain AI tasks," notes Tom's Hardware analysis. However, actual adoption will depend on how quickly AI frameworks integrate support and the availability of hardware in 2025-2026.

Historical context

The Intel-AMD collaboration is rare but necessary to maintain x86's relevance against ARM (with its SVE/SVE2 extensions) and RISC-V (with vector extensions). The last time both companies worked together on an x86 specification was in 2017 with the USB-C connectivity platform. Back then, the goal was to standardize connectivity; now, it's AI. This move reflects competitive pressure: ARM already has matrix extensions (SME) in its latest designs, and RISC-V is advancing with vector extensions that enable AI acceleration. Additionally, the server market has seen growth in custom accelerators (Google TPU, AWS Inferentia), threatening x86's dominance in data centers. With ACE, Intel and AMD aim to ensure the x86 ecosystem remains relevant for AI workloads, protecting their installed base of servers and clients. According to analysts, standardization could also facilitate x86 adoption in new markets, such as automotive (autonomous driving) and IoT devices, where energy efficiency is key.

Intel and AMD Unify AI on x86 with ACE: Efficient Matrix Instructions

What happened?

Why is it important?

Market implications

What should readers know?

Historical context

Keep reading