News

AWS Neuron SDK 2.29.0: NKI Stable, CPU Simulator & Expanded Library

News | 22.04.2026

AWS Neuron SDK 2.29.0 graduates NKI to Stable, introduces a local CPU Simulator, and expands the kernel library — lowering the barrier for AI accelerator development on Trainium and Inferentia.

Organizations building large-scale AI inference and training pipelines on AWS face a persistent challenge: developing and debugging custom kernels for specialized hardware like Trainium and Inferentia traditionally requires direct access to that hardware. With the release of AWS Neuron SDK 2.29.0, that constraint is significantly reduced. This update marks a maturity milestone for the AWS AI chip software ecosystem, delivering stable APIs, local development tools, and broader model support that directly benefit engineering teams across the CEE region and globally.

What was announced

AWS released Neuron SDK 2.29.0 on April 17, 2026, promoting the Neuron Kernel Interface (NKI) from Beta to Stable at version 0.3.0. NKI provides developers with direct, low-level programming access to AWS Trainium and AWS Inferentia NeuronCores using a Python-based syntax.

Key announcements in this release include the introduction of the NKI Standard Library, which exposes developer-visible source code for all NKI APIs and native language objects. A new CPU Simulator allows developers to write, test, and debug NKI kernels locally on standard CPU hardware without requiring Trainium instances, using familiar Python debugging tools. NKI 0.3.0 also adds ISA-level features including a dedicated exponential instruction, matmul accumulation control, DMA priority settings for Trn3, and variable-length all-to-all collectives.

The NKI Library has expanded with 7 new experimental kernels covering Conv1D, a multi-layer Transformer token generation megakernel, fused communication-compute primitives for Trainium2, and dynamic tiling operations. Existing kernels received improvements: Attention CTE now scales to larger batch sizes and sequence lengths, MLP adds mixed-precision quantization paths, and MoE TKG introduces a dynamic all-expert algorithm.

For inference workloads, NxD Inference improves vision language model support with optimizations for Qwen3 VL and Qwen2 VL, including text-model sequence parallelism and vision data parallelism. The vLLM Neuron Plugin has been updated to version 0.5.0. Neuron Explorer, the profiling and debugging suite, also graduates from Beta to Stable, with the System Trace Viewer now supporting the full set of Device widgets for multi-device profile analysis and availability on the VS Code Extension Marketplace.

Why this matters for CEE

For IT directors, cloud architects, and AI engineering leads across Albania, Armenia, Austria, Bulgaria, Czech Republic, Germany, Poland, Ukraine, and the broader CEE region, this release addresses real operational concerns. Building AI accelerator workloads on proprietary hardware has historically required expensive and time-consuming hardware-in-the-loop development cycles. The new CPU Simulator fundamentally changes that workflow: teams can now develop, iterate, and validate NKI kernels on standard developer machines before deploying to Trainium or Inferentia instances, reducing cloud compute costs during the development phase and accelerating time to production.

The graduation of NKI to Stable status provides the API stability guarantees that enterprise teams require before committing to production deployments. Organizations in regulated industries or those operating on longer procurement and integration cycles can now plan integrations around a stable interface. The expanded NKI Library with 7 new kernels, combined with improvements to existing Attention, MLP, and MoE kernels, means that teams working on large language models, vision-language models, and mixture-of-experts architectures have more building blocks available without writing kernels from scratch.

The availability of Neuron Explorer on the VS Code Extension Marketplace also matters operationally: it removes friction from toolchain setup, enabling faster onboarding of new engineers onto Neuron-based development workflows.

Technical details

  • NKI version: Promoted from Beta to Stable at version 0.3.0
  • CPU Simulator: Local kernel development and debugging on standard CPU without Trainium hardware, using standard Python debugging tools
  • NKI Standard Library: Developer-visible source code for all NKI APIs and native language objects
  • New ISA-level features: Dedicated exponential instruction, matmul accumulation control, DMA priority settings for Trn3, variable-length all-to-all collectives
  • New experimental kernels (7 total): Conv1D, multi-layer Transformer token generation megakernel, fused communication-compute primitives for Trainium2, dynamic tiling operations
  • Attention CTE: Scales to larger batch sizes and sequence lengths
  • MLP kernel: Adds mixed-precision quantization paths
  • MoE TKG: Introduces dynamic all-expert algorithm
  • NxD Inference: Vision language model optimizations for Qwen3 VL and Qwen2 VL; text-model sequence parallelism; vision data parallelism
  • vLLM Neuron Plugin: Updated to version 0.5.0
  • Neuron Explorer: Graduated from Beta to Stable; System Trace Viewer supports full Device widgets for multi-device profiling
  • VS Code Extension Marketplace: Neuron Explorer now available for streamlined installation
  • Availability: All AWS Regions supporting Inferentia and Trainium instances

Softprom and Amazon Web Services

Softprom is the official partner of Amazon Web Services in the CEE region, providing procurement, licensing, technical pre-sales support, and advisory services for AWS cloud solutions across more than 30 countries. As part of its partnership, Softprom helps organizations evaluate, adopt, and scale AWS infrastructure including Trainium and Inferentia-based AI compute workloads.

For engineering and procurement teams evaluating the Neuron SDK 2.29.0 release and its implications for AI training or inference workloads on AWS, Softprom's technical specialists can assist with architecture guidance, instance selection, and integration planning.

This content was prepared as part of the Softprom DistriFlow project — an automated system for monitoring and adapting vendor news. Original source: original article.