
AI Inference · AWS · Cloud Computing · Generative AI
Amazon Web Services (AWS) and Cerebras Systems have announced a strategic collaboration to deliver advanced AI inference solutions for generative AI applications, deploying on Amazon Bedrock within AWS data centers.
This partnership leverages AWS Trainium-powered servers, Cerebras CS-3 systems, and Elastic Fabric Adapter (EFA) networking, with a service launch expected in the coming months. The innovative approach utilizes inference disaggregation, where AWS Trainium handles prompt processing and Cerebras CS-3 manages output generation, connected by EFA.
This aims to overcome critical speed bottlenecks in demanding AI workloads like real-time coding assistance. AWS is the exclusive cloud provider for Cerebras's disaggregated inference solution, further solidifying its generative AI offerings.
The collaboration also indicates future plans for AWS to offer open-source large language models and Amazon Nova using Cerebras hardware later this year. This move enhances AWS's competitive edge in cloud AI infrastructure, building on its Nitro System and Trainium chip, which is already utilized by major players like Anthropic and OpenAI.
Cerebras CS-3, known for its high memory bandwidth, is also used by prominent AI firms. This initiative aligns with Amazon's broader AI expansion efforts, including a significant bond sale to fund AI investments, underscoring the company's commitment to leading the AI revolution.