397B parameter model runs on PC with AMD Ryzen

TL;DR: A PC with AMD Ryzen AI Max+ 395 and 128GB RAM has run a 397B parameter AI model, something that previously required a GPU cluster. This democratizes high-level AI and accelerates edge computing.

What happened?

Longsys, a Chinese memory and storage manufacturer, has demonstrated the execution of a massive 397 billion parameter artificial intelligence model on a desktop PC based on the AMD Ryzen AI Max+ 395 processor. The system featured 128 GB of unified RAM, leveraging AMD's CPU-GPU shared memory architecture, similar to the Ryzen 7040/8040 series with RDNA 3 graphics and XDNA AI engine. The feat was reported by TechRadar, which calls it a huge leap in edge computing, emphasizing that it is a technical milestone that brings large-scale AI closer to local environments.

The model used was not publicly specified, but given its size (397B parameters), it is speculated to be a quantized version of Llama 3.1 405B or similar, as 128 GB of unified RAM can accommodate models of that order with 4 or 8-bit quantization. The demonstration was performed on a custom PC built by Longsys, indicating that it is not an immediate commercial product but a proof-of-concept to validate technical feasibility.

Why is it important?

Until now, models of this size required multiple enterprise GPUs such as NVIDIA H100 (80 GB VRAM each) or A100, in configurations of 8 or more units, costing well over $300,000 and consuming several kilowatts of power. Running them on a consumer PC drastically lowers the barrier to entry, allowing startups, universities, and individual developers to experiment with frontier models without relying on the cloud or expensive infrastructure. This democratizes access to high-level AI, accelerating innovation in fields like medical research, natural language processing, and computer vision.

Additionally, edge computing greatly benefits: being able to run massive models locally improves data privacy, reduces latency, and eliminates dependence on internet connections. Sectors such as healthcare (imaging diagnostics), finance (fraud detection), and defense (real-time analysis) could adopt these capabilities without sending sensitive data to the cloud.

What consequences will it have?

In the short term, we will see an increase in the development of models optimized for shared memory and aggressive quantization techniques. AMD could gain traction in the AI workstation market against NVIDIA, which dominates with dedicated GPUs but at a higher cost. However, inference latency and performance will be lower than a dedicated cluster, so it will not replace data centers for massive deployments or training. Longsys's demonstration is an indicator that AMD's unified memory architecture (similar to Apple's with its M chips) can compete in the local inference segment.

In the long term, this will drive edge AI and applications where data privacy is critical. It could also pressure NVIDIA to offer more affordable solutions or accelerate the development of its own APUs with unified memory. Companies like Microsoft (with its Pluton chip) and Apple (with M2 Ultra) are also moving in this direction, but AMD has the advantage of being an open standard in PCs.

What readers should know

The model used was not specified, but it is likely a quantized version (4-bit or 8-bit) to fit in 128 GB. A 397B parameter model in FP16 would occupy ~794 GB, impossible on a PC. With 4-bit quantization, the size reduces to ~198 GB, still above 128 GB, so additional techniques like pruning or distillation are required.
Inference performance has not been disclosed; it is likely slow compared to dedicated GPUs. Inference of large models on APUs can be 1-5 tokens per second, versus 50+ tokens/s on H100. However, for batch or non-interactive tasks, it may be sufficient.
This demonstration is a proof-of-concept, not a commercial product. Longsys is known for memory and SSDs, not AI PCs, so commercialization would depend on integrators or AMD launching similar platforms.
The AMD Ryzen AI Max+ 395 integrates an NPU (Neural Processing Unit) for AI acceleration, but running such large models primarily relies on RAM and the integrated RDNA 3.5 GPU. The NPU is designed for smaller, more efficient models.

"The ability to run 400B parameter models on a desktop PC is a milestone that redefines what we consider 'edge computing'." — Analyst at TheVortiq.

Historical context

Just a year ago, running a 400B parameter model required at least 8 H100 GPUs (80 GB each) or specialized servers with high-speed interconnects like NVLink. Hardware costs exceeded $300,000, plus electricity and cooling. The evolution of unified memory in AMD APUs, combined with quantization techniques (such as GPTQ, AWQ, GGUF), has enabled this advance. It is comparable to the leap from mainframes to PCs in the 1980s: what once required an entire room now fits on a desk. Apple had already demonstrated running 70B models on Mac Studio with M2 Ultra (192 GB unified), but 397B is an order of magnitude larger.

Additionally, the open-source community has developed tools like llama.cpp and Ollama that optimize inference on CPU/integrated GPU, facilitating these experiments. Longsys has leveraged this ecosystem for its demonstration.

Market implications

Companies like Apple (with its M2 Ultra unified memory) and Microsoft (with Pluton) are also moving in this direction. AMD could position itself as a leader in affordable AI workstations, directly competing with Apple's Mac Studio and NVIDIA GPU workstations. However, NVIDIA still dominates in raw performance and has a software advantage (CUDA, TensorRT). Competition will benefit consumers with more options and lower prices.

For Longsys, this demonstration is a marketing strategy to position itself as an innovator in AI memory. It could boost demand for its DDR5 modules and high-capacity SSDs. For AMD, it is a proof-of-concept that could translate into future APUs with higher bandwidth and unified memory capacity, perhaps reaching 256 GB in upcoming generations.

In summary, Longsys's feat marks a before and after in edge computing, bringing frontier AI closer to end users. Although there are still performance limitations, the trend is clear: local AI is rapidly democratizing.

397B parameter model runs on a PC with Ryzen AI Max+ 395