Arrcus announced record 3x bookings growth in 2025 across datacenter, telco and enterprise customers for mission critical switching and routing applications deployed in production across thousands of network nodes globally. Customers have valued the flexibility, innovation and feature velocity of the ArcOS network operating system and the ACE platform across a broad range of open networking hardware, along with significant reduction in capital and operating costs compared to alternative incumbent networking solutions. Building on this success, the company also announced its Arrcus Inference Network Fabric (AINF), designed to improve the delivery of inferencing AI applications across a highly distributed network by steering traffic between inferencing nodes, caches, and datacenters with the goal of increasing throughput Tokens per second (TPS), reducing Time to First Token (TTFT), and improving End to End Latency (E2EL) for inferencing.
Also read: Arrcus Strengthens its Global Vision with India at the Core of its Growth Strategy
With the rise of Agentic and Physical AI, Inferencing is expected to be the fastest growing AI segment. However, widespread adoption of Agentic AI is bottlenecked by challenges in the speed of delivery of inference results, diversity in inference models, and bringing smart inference decision making closer to edge nodes. Inferencing infrastructure is deployed in highly distributed clusters, and needs to address the requirements of low latency, availability, constraints in power grid capacity, data sovereignty, and cost. While Enterprises are looking to deploy real-time inferencing so users can have rich localized experiences, Network operators are looking to deliver Inferencing-as-a-service in alignment with the Service Level Objectives (SLO) around these key requirements. To meet these challenges, inferencing infrastructure will require a distributed routing fabric that has granular policy control to intelligently steer traffic and match rapidly evolving requirements. Traditional hardware-defined networking solutions from incumbent vendors fall short in addressing these challenges.
Announced today, the Arrcus Inferencing Network Fabric (AINF) is a purpose-built solution that enables delivery of Inferencing applications with an intelligent ‘AI policy-aware’ network fabric that can dynamically route AI traffic as required between inference nodes, caches, and datacenters to the most appropriate site. Operators can define business policies such as latency targets, data sovereignty boundaries, model preferences, or power constraints. AINF enables evaluation of these conditions in real time to steer inference traffic to the optimal node or cache, ensuring the right model is delivered from the right location at the right time. Research[1] shows that such innovation in AI infrastructure can deliver over 60% reduction in TTFT, 15% TPS improvement, 40% E2EL and up to 30% cost reduction.
“To enhance agentic AI adoption by improving response times, networks need to become AI-aware,” said Shekar Ayyar, Chairman and CEO of Arrcus. “AINF extends Arrcus’ leadership in distributed networking by delivering the first fabric designed to meet the latency, sovereignty, and power constraints of large-scale AI inferencing.”
At its core, AINF introduces a policy abstraction layer that translates the inferencing application intent to underlying infrastructure performance, while shielding operators from infrastructure complexity. AINF components include query-based inference routing with policy management, interconnect routers, and edge networking. AINF is designed to integrate with popular inference frameworks including vLLM, Nvidia Dynamo, SGLang, Triton, and others, thus coupling optimal model selection with a high performance steering fabric. Using Kubernetes based orchestration AINF can be composed and deployed in an automated manner. Concepts like prefix awareness to optimize KV cache usage enable inferencing applications to meet SLO for throughput, token retrieval time, latency, data sovereignty, power and cost.
AINF builds on Arrcus’ proven leadership in AI and Datacenter Networking, with its ACE-AI solution already delivering a unified network fabric for distributed AI spanning datacenters, edge and hybrid cloud environments with scale-out and scale-across solutions. As with all Arrcus solutions, AINF has the unique capability of working with best of breed inferencing xPUs and network silicon across hardware providers. It is also designed to allow partner companies to bring in their load balancers, firewalls, and power management policies to create optimal caching and secure CDNs for superior inference results.
“AI Fabrics, scale-up, scale-out, and scale-across, are poised to approach $200B in revenue by 2030 with Ethernet being the major contributor,” said Alan Weckel, Founder and Technology Analyst at 650 Group. ‘Network fabrics can significantly improve AI fabric performance and help customers scale the network with the rapid growth in accelerators as the market moves from foundational model training to inference being the dominant use case.”
“Traditional network fabrics weren’t designed with AI inference workloads in mind. Arrcus’ Inference Network Fabric changes that with a policy-aware, intent-driven approach that understands inference-specific demands, latency sensitivity, model selection, cache optimization, and dynamically routes traffic accordingly,” said Roy Chua, Founder and Principal, AvidThink. “As inferencing scales across distributed environments, this kind of workload-aware networking will be essential to maximizing AI-enabled application performance.”
“With its efficient distributed cloud networking platform and newly announced Arrcus Inferencing Network Fabric (AINF), Arrcus is well-positioned to serve diverse networking needs across industries, providing scalable and high-performance connectivity for any application ranging from communications services to AI inference,” said Scott Raynovich, Founder and Principal Analyst, Futuriom.






