DeepInfra Closes $107M Series B to Power Production-Scale AI Inference

Funding fuels global expansion of DeepInfra’s purpose-built inference cloud as AI demand shifts from model training to production scale

PALO ALTO, Calif., May 04, 2026 (GLOBE NEWSWIRE) -- DeepInfra, a purpose-built cloud platform for high-throughput AI inference, today announced $107 million in Series B funding to scale its inference cloud and global capacity. Processing nearly five trillion tokens per week, DeepInfra enables enterprises and scaleups to run open-source and agent-driven AI workloads with improved cost, performance and security.

Developed by the team behind the popular messenger app, imo, which has scaled across more than 200 million users globally, DeepInfra’s latest round is co-led by 500 Global and Georges Harik, one of Google’s earliest engineers, with participation from A.Capital Ventures, Crescent Cove, Felicis, NVIDIA, Peak6, Samsung Next, Supermicro and Upper90.

“When we launched nearly four years ago, we believed inference would become the dominant driver of enterprise AI workloads – and we are now at this inflection point,” said Nikola Borisov, co-founder and CEO, DeepInfra. “What’s happening now is incredibly exciting – open-source models are rapidly reaching parity with proprietary systems, unlocking a new wave of innovation at a fraction of the cost and enabling widespread adoption. At the same time, agent-based systems are driving continuous, high-volume demand. Inference is no longer a thin layer – it’s the system constraint that will define the majority of workloads. Most cloud platforms weren’t built for this always-on, distributed model, so we built DeepInfra from the ground up to deliver better economics, performance, and security.”

The investment reflects 500 Global’s portfolio thesis across the AI stack. The firm’s conviction is that infrastructure will be as defining a category as the models themselves.

"Demand for AI is causing every layer of the AI stack to innovate, and inference is no exception. In the agentic age, new workflows are arising on a rapid basis, as evidenced recently by OpenClaw and AutoResearch. Enterprises and developers building with open source and agent-driven AI need infrastructure that was designed to be flexible, fast and reliable. We backed DeepInfra because, in our assessment, this team has already proven they can build and operate distributed systems at global scale, and because we believe purpose-built inference infrastructure will be fundamental to the next phase of AI as compute was to the last," said Tony Wang, Managing Partner, 500 Global.

Open Source AI Inference Will Drive Market Adoption

As AI systems move into always-on, agentic applications, inference is becoming a core constraint on performance and cost, requiring significantly more computation per request – often involving dozens of model calls. At the same time, enterprises are adopting open-source models as they continue to improve in reasoning capabilities and approach proprietary performance, enabling organizations to control costs, avoid vendor lock-in, maintain data sovereignty, and support emerging sovereign AI strategies.

Deloitte estimates inference could account for roughly two-thirds of all compute this year. DeepInfra believes this shift remains underestimated, with agent-based systems driving continuous, high-volume token generation as they operate independently and at scale. DeepInfra was built from the ground up specifically for this shift, with infrastructure optimized for inference rather than general-purpose cloud workloads. The company’s revenue has tripled since the beginning of 2026, reflecting this demand.

DeepInfra is an early infrastructure collaborator in NVIDIA’s open AI ecosystem, supporting Nemotron models, NemoClaw agent framework, and the NVIDIA Dynamo inference software. As NVIDIA advances the Nemotron family of open-source models and agent-based systems like OpenClaw drive increased inference demand, DeepInfra is one of the vendors providing the infrastructure layer for these systems in production. The platform supports more than 190 open-source models through OpenAI-compatible APIs and offers a fully managed, enterprise-ready environment with built-in security, including zero data retention and SOC 2 and ISO 27001 certification.

"DeepInfra gives us access to best-in-class models with the reliability and speed we need to ship. The performance speaks for itself. They help us keep up with the pace of innovation in this space,” said Jesse Proudman, president and CTO, Venice AI.

DeepInfra’s Approach to Scaling High-Throughput AI Inference

DeepInfra’s approach is grounded in the team’s experience building and scaling global distributed systems. By owning and operating its hardware from the outset, the company is building an AI token factory for enterprise-scale agentic and inference workloads.

Purpose-built and vertically integrated for inference: DeepInfra is architected specifically for high-throughput inference, optimizing the full stack from chips to APIs. The company owns its GPU infrastructure across eight U.S. data centers and is expanding globally with additional locations planned. This enables structurally lower costs per token and more predictable latency than hyperscalers and general-purpose cloud providers that rely on spot or rented capacity.
Optimized for the agentic era: Agent systems can require 50–100+ model calls per task and operate continuously, driving sustained, high-volume token demand. DeepInfra is designed to keep latency and cost predictable under these workloads — the company estimates that nearly 30% of weekly token volume comes from agents like OpenClaw, a scale that validates the infrastructure in production.
Optimized performance: DeepInfra works with NVIDIA to optimize performance on its GPU architectures, including early deployment of Blackwell GPUs and Vera Rubin with NVIDIA Dynamo distributed-inference software – unlocking 20x improvements in inference cost efficiency.

The funding will accelerate DeepInfra’s global compute capacity, enhance developer tooling, and support next-generation models at a moment when production-grade inference infrastructure is becoming the decisive variable in enterprise AI deployment.

To learn more, visit deepinfra.com.

About DeepInfra
Founded in 2022, DeepInfra is a purpose-built cloud inference platform for high-throughput AI, enabling companies to run open-source and proprietary AI and agentic models at scale. The company owns and operates its GPU infrastructure, supports 190+ open-source models, and processes nearly five trillion tokens per week — delivering the cost efficiency, low latency, and security that production AI demands. Its fully managed platform includes OpenAI-compatible APIs and enterprise-grade security compliance, including zero data retention and SOC 2 and ISO 27001 certification. For more information, visit deepinfra.com.

About 500 Global
500 Global is a multi-stage venture capital firm with $2.2 billion in assets under management¹ that invests in founders building fast-growing technology companies. 500 Global focuses on markets where technology, innovation, and capital can unlock long-term value and drive economic growth and development. 500 Global has backed 5,000+ founders representing 3,000+ companies operating in 80+ countries, with 35+ companies valued at over $1 billion and 165+ companies valued at over $100 million². For more information, please visit www.500.co.

Media Contacts:
Julianna Sheridan
Scratch M+M on behalf of DeepInfra
+1-774-452-0007
deepinfra@scratchmm.com

Media Relations at 500 Global
press@500.co

¹Assets under management (AUM) calculations are based on internal estimates as of June 30, 2025.

²Including private, public, and exited companies as of June 30, 2025.

Legal Disclaimer:

EIN Presswire provides this news content "as is" without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the author above.