Skip to main content

llm-d Press Release

· 12 min read

May 20, 2025

Red Hat Launches the llm-d Community, Powering Distributed Gen AI Inference at Scale

Forged in collaboration with founding contributors CoreWeave, Google Cloud, IBM Research and NVIDIA and joined by industry leaders AMD, Cisco, Hugging Face, Intel, Lambda and Mistral AI and university supporters at the University of California, Berkeley, and the University of Chicago, the project aims to make production generative AI as omnipresent as Linux

BOSTON – RED HAT SUMMIT – MAY 20, 2025 — Red Hat, the world's leading provider of open source solutions, today announced the launch of llm-d, a new open source project that answers the most crucial need of generative AI’s (gen AI) future: Inference at scale. Tapping breakthrough inference technologies for gen AI at scale, llm-d is powered by a native Kubernetes architecture, vLLM-based distributed inference and intelligent AI-aware network routing, empowering robust, large language model (LLM) inference clouds to meet the most demanding production service-level objectives (SLOs).

While training remains vital, the true impact of gen AI hinges on more efficient and scalable inference - the engine that transforms AI models into actionable insights and user experiences. According to Gartner1, “By 2028, as the market matures, more than 80% of data center workload accelerators will be specifically deployed for inference as opposed to training use.” This underscores that the future of gen AI lies in the ability to execute. The escalating resource demands of increasingly sophisticated and larger reasoning models limits the viability of centralized inference and threatens to bottleneck AI innovation with prohibitive costs and crippling latency.

Answering the need for scalable gen AI inference with llm-d

Red Hat and its industry partners are directly confronting this challenge with llm-d, a visionary project that amplifies the power of vLLM to transcend single-server limitations and unlock production at scale for AI inference. Using the proven orchestration prowess of Kubernetes, llm-d integrates advanced inference capabilities into existing enterprise IT infrastructures. This unified platform empowers IT teams to meet the diverse serving demands of business-critical workloads, all while deploying innovative techniques to maximize efficiency and dramatically minimize the total cost of ownership (TCO) associated with high-performance AI accelerators.

llm-d delivers a powerful suite of innovations, highlighted by:

  • vLLM, which has quickly become the open source de facto standard inference server, providing day 0 model support for emerging frontier models, and support for a broad list of accelerators, now including Google Cloud Tensor Processor Units (TPUs).
  • Prefill and Decode Disaggregation to separate the input context and token generation phases of AI into discrete operations, where they can then be distributed across multiple servers.
  • KV (key-value) Cache Offloading, based on LMCache, shifts the memory burden of the KV cache from GPU memory to more cost-efficient and abundant standard storage, like CPU memory or network storage.
  • Kubernetes-powered clusters and controllers for more efficient scheduling of compute and storage resources as workload demands fluctuate, while maintaining performance and lower latency.
  • AI-Aware Network Routing for scheduling incoming requests to the servers and accelerators that are most likely to have hot caches of past inference calculations.
  • High-performance communication APIs for faster and more efficient data transfer between servers, with support for NVIDIA Inference Xfer Library (NIXL).

llm-d: Backed by industry leaders

This new open source project has already garnered the support of a formidable coalition of leading gen AI model providers, AI accelerator pioneers, and premier AI cloud platforms. CoreWeave, Google Cloud, IBM Research and NVIDIA are founding contributors, with AMD, Cisco, Hugging Face, Intel, Lambda and Mistral AI as partners, underscoring the industry’s deep collaboration to architect the future of large-scale LLM serving. The llm-d community is further joined by founding supporters at the Sky Computing Lab at the University of California, originators of vLLM, and the LMCache Lab at the University of Chicago, originators of LMCache.

Rooted in its unwavering commitment to open collaboration, Red Hat recognizes the critical importance of vibrant and accessible communities in the rapidly evolving landscape of gen AI inference. Red Hat will actively champion the growth of the llm-d community, fostering an inclusive environment for new members and fueling its continued evolution.

Red Hat’s vision: Any model, any accelerator, any cloud.

The future of AI must be defined by limitless opportunity, not constrained by infrastructure silos. Red Hat sees a horizon where organizations can deploy any model, on any accelerator, across any cloud, delivering an exceptional, more consistent user experience without exorbitant costs. To unlock the true potential of gen AI investments, enterprises require a universal inference platform - a standard for more seamless, high-performance AI innovation, both today and in the years to come.

Just as Red Hat pioneered the open enterprise by transforming Linux into the bedrock of modern IT, the company is now poised to architect the future of AI inference. vLLM’s potential is that of a linchpin for standardized gen AI inference, and Red Hat is committed to building a thriving ecosystem around not just the vLLM community but also llm-d for distributed inference at scale. The vision is clear: regardless of the AI model or the underlying accelerator or the deployment environment, Red Hat intends to make vLLM the definitive open standard for inference across the new hybrid cloud.

Red Hat Summit
Join the Red Hat Summit keynotes to hear the latest from Red Hat executives, customers and partners:

Supporting Quotes
Brian Stevens, senior vice president and AI CTO, Red Hat
“The launch of the llm-d community, backed by a vanguard of AI leaders, marks a pivotal moment in addressing the need for scalable gen AI inference, a crucial obstacle that must be overcome to enable broader enterprise AI adoption. By tapping the innovation of vLLM and the proven capabilities of Kubernetes, llm-d paves the way for distributed, scalable and high-performing AI inference across the expanded hybrid cloud, supporting any model, any accelerator, on any cloud environment and helping realize a vision of limitless AI potential.”

Ramine Roane, corporate vice president, AI Product Management, AMD
"AMD is proud to be a founding member of the llm-d community, contributing our expertise in high-performance GPUs to advance AI inference for evolving enterprise AI needs. As organizations navigate the increasing complexity of generative AI to achieve greater scale and efficiency, AMD looks forward to meeting this industry demand through the llm-d project."

Shannon McFarland, vice president, Cisco Open Source Program Office & Head of Cisco DevNet
“The llm-d project is an exciting step forward for practical generative AI. llm-d empowers developers to programmatically integrate and scale generative AI inference, unlocking new levels of innovation and efficiency in the modern AI landscape. Cisco is proud to be part of the llm-d community, where we’re working together to explore real-world use cases that help organizations apply AI more effectively and efficiently.”

Chen Goldberg, senior vice president, Engineering, CoreWeave
“CoreWeave is proud to be a founding contributor to the llm-d project and to deepen our long-
standing commitment to open source AI. From our early partnership with EleutherAI to our ongoing work advancing inference at scale, we’ve consistently invested in making powerful AI infrastructure more accessible. We’re excited to collaborate with an incredible group of partners
and the broader developer community to build a flexible, high-performance inference engine
that accelerates innovation and lays the groundwork for open, interoperable AI.”

Mark Lohmeyer, vice president and general manager, AI & Computing Infrastructure, Google Cloud
"Efficient AI inference is paramount as organizations move to deploying AI at scale and deliver value for their users. As we enter this new age of inference, Google Cloud is proud to build upon our legacy of open source contributions as a founding contributor to the llm-d project. This new community will serve as a critical catalyst for distributed AI inference at scale, helping users realize enhanced workload efficiency with increased optionality for their infrastructure resources."

Jeff Boudier, Head of Product, Hugging Face
“We believe every company should be able to build and run their own models. With vLLM leveraging the Hugging Face transformers library as the source of truth for model definitions; a wide diversity of models large and small is available to power text, audio, image and video AI applications. Eight million AI Builders use Hugging Face to collaborate on over two million AI models and datasets openly shared with the global community. We are excited to support the llm-d project to enable developers to take these applications to scale.”

Priya Nagpurkar, vice president, Hybrid Cloud and AI Platform, IBM Research
“At IBM, we believe the next phase of AI is about efficiency and scale. We’re focused on unlocking value for enterprises through AI solutions they can deploy effectively. As a founding contributor to llm-d, IBM is proud to be a key part of building a differentiated hardware agnostic distributed AI inference platform. We’re looking forward to continued contributions towards the growth and success of this community to transform the future of AI inference.”

Bill Pearson, vice president, Data Center & AI Software Solutions and Ecosystem, Intel
“The launch of llm-d will serve as a key inflection point for the industry in driving AI transformation at scale, and Intel is excited to participate as a founding supporter. Intel’s involvement with llm-d is the latest milestone in our decades-long collaboration with Red Hat to empower enterprises with open source solutions that they can deploy anywhere, on their platform of choice. We look forward to further extending and building AI innovation through the llm-d community.”

Eve Callicoat, senior staff engineer, ML Platform, Lambda
"Inference is where the real-world value of AI is delivered, and llm-d represents a major leap forward. Lambda is proud to support a project that makes state-of-the-art inference accessible, efficient, and open."

Ujval Kapasi, vice president, Engineering AI Frameworks, NVIDIA
“The llm-d project is an important addition to the open source AI ecosystem and reflects NVIDIA’s support for collaboration to drive innovation in generative AI. Scalable, highly performant inference is key to the next wave of generative and agentic AI. We’re working with Red Hat and other supporting partners to foster llm-d community engagement and industry adoption, helping accelerate llm-d with innovations from NVIDIA Dynamo such as NIXL.”

Ion Stoica, Professor and Director of Sky Computing Lab, University of California, Berkeley
“We are pleased to see Red Hat build upon the established success of vLLM, which originated in our lab to help address the speed and memory challenges that come with running large AI models. Open source projects like vLLM, and now llm-d anchored in vLLM, are at the frontier of AI innovation tackling the most demanding AI inference requirements and moving the needle for the industry at large.”

Junchen Jiang, CS Professor, LMCache Lab, University of Chicago
“Distributed KV cache optimizations, such as offloading, compression, and blending, have been a key focus of our lab, and we are excited to see llm-d leveraging LMCache as a core component to reduce time to first token as well as improve throughput, particularly in long-context inference.”

Additional Resources

Connect with Red Hat

About Red Hat
Red Hat is the open hybrid cloud technology leader, delivering a trusted, consistent and comprehensive foundation for transformative IT innovation and AI applications. Its portfolio of cloud, developer, AI, Linux, automation and application platform technologies enables any application, anywhere—from the datacenter to the edge. As the world's leading provider of enterprise open source software solutions, Red Hat invests in open ecosystems and communities to solve tomorrow's IT challenges. Collaborating with partners and customers, Red Hat helps them build, connect, automate, secure and manage their IT environments, supported by consulting services and award-winning training and certification offerings.

Forward-Looking Statements
Except for the historical information and discussions contained herein, statements contained in this press release may constitute forward-looking statements within the meaning of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are based on the company’s current assumptions regarding future business and financial performance. These statements involve a number of risks, uncertainties and other factors that could cause actual results to differ materially. Any forward-looking statement in this press release speaks only as of the date on which it is made. Except as required by law, the company assumes no obligation to update or revise any forward-looking statements.

Media Contact:
John Terrill
Red Hat
+1-571-421-8132
jterrill@redhat.com

###

Red Hat and the Red Hat logo are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries in the U.S. and other countries.

Footnotes

  1. Forecast Analysis: AI Semiconductors, Worldwide, Alan Priestley, Gartner, 2 August 2024 - ID G00818912 GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in