Cloudflare AI Engineering Stack: How It Works in 2026
Cloudflare built and deployed a full agentic AI engineering stack on its own products — reaching 93% adoption across its R&D organization in less than a year and processing over 241 billion tokens per month through AI Gateway.
Enterprise IT and security leaders increasingly face the question of how to integrate AI into engineering workflows without sacrificing security, observability, or code quality. Cloudflare has answered that question by eating its own cooking: the same platform it sells to customers now powers its internal developer tooling at massive scale, providing a detailed, reproducible blueprint for any organization looking to do the same.
What Was Announced
Cloudflare published a detailed technical account of the internal AI engineering stack it built over eleven months through a cross-functional tiger team called iMARS (Internal MCP Agent/Server Rollout Squad). The results, measured over the last 30 days, are concrete:
- Active AI users: 3,683 employees (60% company-wide, 93% across R&D)
- AI requests processed: 47.95 million
- AI Gateway requests per month: 20.18 million
- Tokens routed through AI Gateway: 241.37 billion
- Tokens processed on Workers AI: 51.83 billion
- Teams using agentic AI tools: 295
- Merge request growth: 4-week rolling average climbed from approximately 5,600 per week to over 8,700, with the week of March 23 hitting 10,952 — nearly double the Q4 baseline
The stack is structured across three layers: a platform layer handling authentication and inference routing, a knowledge layer giving agents structured context about services and codebases, and an enforcement layer maintaining code quality through automated AI review.
Why This Matters for CEE
For CIOs, CISOs, and IT directors in Central and Eastern Europe, this announcement is significant for three reasons. First, it demonstrates that enterprise-grade agentic AI adoption is operationally achievable at scale — Cloudflare moved from project launch to 93% R&D adoption in under twelve months. Second, every product used internally is a shipping Cloudflare product available to customers today: AI Gateway, Workers AI, Cloudflare Access, Agents SDK, Sandbox SDK, and Workflows. Third, the architecture directly addresses the two concerns most common in CEE enterprise environments — security and cost control. Zero Trust authentication via Cloudflare Access governs every AI request, no API keys are stored on user machines, and anonymous user tracking prevents identity exposure to model providers. On the cost side, routing inference through Workers AI on the open-source Kimi K2.5 model reduced one security agent's costs by 77% compared to a mid-tier proprietary model, translating to an estimated saving of roughly 1.85 million USD per year on that single workload.
Technical Details
The architecture is organized around three functional layers, each mapping directly to Cloudflare products:
Platform Layer
- Zero Trust authentication: Cloudflare Access enforces identity and policy for every AI request before it reaches any model
- Centralized LLM routing: AI Gateway provides a single control plane for provider key management, cost tracking, Zero Data Retention controls, and per-user anonymous attribution using D1 and Workers KV
- On-platform inference: Workers AI runs open-weight models including Kimi K2.5 (256k context, tool calling, structured outputs) directly on Cloudflare's global GPU network — no cross-cloud hops, lower latency, and significantly lower cost
- MCP Server Portal: A single OAuth endpoint aggregating 13 production MCP servers exposing 182-plus tools across GitLab, Jira, Sentry, Elasticsearch, Prometheus, Google Workspace, Backstage, and more
- Code Mode proxying: Collapses per-tool schema overhead into two portal-level tools, reducing context window consumption from approximately 15,000 tokens per GitLab server request to a fixed minimal footprint regardless of how many servers are connected
- One-command setup: A single proxy Worker with a discovery endpoint configures providers, models, MCP servers, agents, and permissions automatically — no API keys on laptops, no manual configuration
Knowledge Layer
- Backstage service catalog: 2,055 services, 228 APIs, 544 systems, 1,302 databases, and 375 teams tracked with full dependency graphs — surfaced to agents via a 13-tool MCP server
- AGENTS.md generation at scale: An automated pipeline processed approximately 3,900 repositories, generating structured context files that tell coding agents the correct test commands, conventions, boundaries, and dependencies for each codebase
Enforcement Layer
- AI Code Reviewer: Every merge request across all repositories receives automated AI review via a GitLab CI component; a multi-agent coordinator classifies MRs by risk tier and delegates to specialized agents covering code quality, security, codex compliance, documentation, performance, and release impact
- Coverage in last 30 days: 100% of repos on the standard CI pipeline, 5.47 million AI Gateway requests, 24.77 billion tokens processed
- Engineering Codex: Internal standards are distilled into machine-readable rules that agents can query locally and that the AI Code Reviewer cites by rule ID in every finding
- Model routing strategy: Workers AI handles approximately 15% of reviewer traffic (documentation tasks, cost-sensitive workloads); frontier models handle security-sensitive and architecturally complex reviews
Softprom and Cloudflare
Softprom is the official distributor of Cloudflare in the CEE region. As a distributor, Softprom provides access to the full Cloudflare product portfolio — including AI Gateway, Workers AI, Cloudflare Access, and the Agents SDK — along with local pre-sales expertise, technical support, and partner enablement for organizations across Central and Eastern Europe looking to build secure, observable AI infrastructure on a proven platform.
Ready to explore how Cloudflare's AI platform can accelerate your engineering organization? Contact Softprom specialists via softprom.com/vendor/cloudflare to arrange a technical consultation.
This content was prepared as part of the Softprom DistriFlow project — an automated system for monitoring and adapting vendor news. Original source: original article.