Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.
Nvidia and Amazon Web Services (AWS) are continuing to extend the two companies’ strategic partnership with a series of big announcements today at the AWS re:Invent conference.
At the event, Nvidia is announcing a new DGX Cloud offering that for the first time brings the Grace Hopper GH200 superchip to AWS. Going a step further the new project Ceiba effort will see what could be the world’s largest public cloud supercomputing platform, powered by Nvidia running on AWS, providing 64 exaflops of AI power. AWS will also be adding four new types of GPU powered cloud instances to the EC2 service.
In an effort to help organizations build better large language models (LLMs), Nvidia is also using AWS re:Invent as the venue to announce its NeMo Retriever technology, which is a Retrieval Augmented Generation (RAG) approach to connecting enterprise data to generative AI.
Nvidia and AWS have been partnering for over 13 years, with Nvidia GPU first showing up in AWS cloud computing instances back in 2010. In a briefing with press and analysts, Ian Buck, VP of Hyperscale and HPC at Nvidia commented that the two companies have been working together to improve innovation and operation at AWS as well as for mutual customers including Anthropic, Cohere and Stability AI.
The AI Impact Tour
Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!
“It has also not just been the hardware, it’s also been the software,” Buck said. “We’ve been doing multiple software integrations and often are behind the scenes working together.”
DGX Cloud brings new supercomputing power to AWS
The DGX Cloud is not a new idea from Nvidia, it was actually announced back in March at Nvidia’s GPU Technology Conference (GTC). Nvidia has also previously announced DGX Cloud for Microsoft Azure as well as Oracle Cloud Infrastructure (OCI).
The basic idea behind DGX Cloud is that it is an optimized deployment of Nvidia hardware and software that functionally enable supercomputing type capabilities for AI. Buck emphasized that the DGX Cloud offering that is coming to AWS is not the same DGX Cloud that has been available to date.
“What makes this DGX Cloud announcement special is that this will be the first DGX Cloud powered by NVIDIA Grace Hopper,” Buck said.
The Grace Hopper is Nvidia’s so-called superchip that combines ARM compute with GPUs and it’s a chip that to date has largely been relegated to the realm of supercomputers. The AWS version of DGX Cloud will be running the new GH200 chips in a rack architecture called the GH200 NVL-32. The system integrates 32 GH200 superchips connected together with Nvidia’s high-speed NVLink networking technology. The system is capable of providing up to128 petaflops of AI performance, with a total of 20 terabytes of fast memory across this entire rack.
“It is a new rack scale GPU architecture for the era of generative AI,” Buck said.
Project Ceiba to Build World’s Largest Cloud AI Supercomputer
Nvidia and AWS also announced Project Ceiba, which aims to build the world’s largest cloud AI supercomputer.
Project Ceiba will be built with 16,000 Grace Hopper Superchips and benefit from the use of AWS’ Elastic Fabric Adapter (EFA), the AWS Nitro system and Amazon EC2 UltraCluster scalability technologies. The whole system will provide a staggering 64 Exaflops of AI performance and have up to 9.5 Petabytes of total memory.
“This new supercomputer will be set up inside of AWS infrastructure hosted by AWS and used by Nvidia’s own research and engineering teams to develop new AI for graphics, large language model research, image, video, 3D, generative AI, digital biology, robotics research, self-driving cars and more,” Buck said.
Retrieval is the ‘holy grail’ of LLMs
With the Nvidia NeMo Retriever technology that is being announced at AWS re:invent, Nvidia is looking to help build enterprise grade chatbots.
Buck noted that commonly used LLMs are trained on public data and as such are somewhat limited in their data sets. In order to get the latest most accurate data, there is a need to connect the LLM with enterprise data, to enable organizations to more effectively ask questions and get the right information.
“This is the holy grail for chatbots across enterprises because the vast majority of valuable data is the proprietary data,” Buck said. “Combining AI with your database, the enterprise customer’s database, makes it more productive, more accurate, more useful, and more timely, and lets you optimize even further the performance and capabilities.”
The NeMo Retriever technology comes with a collection of enterprise grade models and retrieval microservices that have been prebuilt to be deployed and integrated into an enterprise workflow. The NeMo Retriever also includes accelerated vector search for optimizing the performance of the vector databases where the data is coming from.
Nvidia already has some early customers for NeMo Retriever including Dropbox, SAP and ServiceNow.
“This offers state of the art accuracy and the lowest possible latency for retrieval augmented generation,” Buck said.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.