Check out the on-demand sessions from the Low-Code/No-Code Summit to learn how to successfully innovate and achieve efficiency by upskilling and scaling citizen developers. Watch now.
Cerebras Systems is unveiling Andromeda, a 13.5 million-core artificial intelligence (AI) supercomputer that can operate at more than an exaflop for AI applications.
The system is made of servers with wafer-size “chips,” each with hundreds of thousands of cores, but it takes up a lot less space and is a lot more powerful than ordinary servers with standard central processing units (CPUs).
Sunnyvale, California-based Cerebras has a radically different way of building chips. Most chips are built on a 12-inch silicon wafer, which is processed with chemicals to embed circuit designs on a rectangular section of the wafer. Those wafers are sliced into individual chips. But Cerebras basically uses a huge rectangular section of a wafer to create just one massive chip, each with 850,000 processing cores on it, said Andrew Feldman, CEO of Cerebras, in an interview with VentureBeat.
“It’s one of the largest AI supercomputers ever built. It has an exaflop of AI compute, 120 petaflops of dense compute. It’s 16 CS-2s with 13.5 million cores. Just to give you an idea, the largest computer on earth, Frontier, has 8.7 million cores.”
By contrast, Advanced Micro Devices’ high-end 4th Gen Epyc server processor had one chip (and six memory chiplets) with just 96 cores. All told, the Andromeda supercomputer assembles its 13.5 million cores by combining a cluster of 16 Cerebras CS-2 wafer-based systems together.
“Customers are already training these large language models [LLMs] — the largest of the language models — from scratch, so we have customers doing training on unique and interesting datasets, which would have been prohibitively time-consuming and expensive on GPU clusters,” Feldman said.
It also uses Cerebras MemoryX and SwarmX technologies to achieve more than one exaflop of AI compute, or a 1 followed by 18 zeroes, or a billion-billion. It can also do 120 petaflops (1 followed by 15 zeroes) of dense computing at 16-bit half precision.
The company unveiled the tech at the SC22 supercomputer show. While this supercomputer is very powerful, it doesn’t qualify on the list of the Top 500 supercomputers because it doesn’t use 64-bit double precision, said Feldman. Still, it is the only AI supercomputer to ever demonstrate near-perfect linear scaling on LLM workloads relying on simple data parallelism alone, he said.
“What we’ve been telling people all year is that we want to build clusters to demonstrate linear scaling across clusters,” Feldman said. “And we want quick and easy distribution of work across the clusters. And we’ve talked about doing that with our MemoryX, which allows us to separate memory of compute and support multi-trillion parameter models.”
And Andromeda features more cores than 1,953 Nvidia A100 GPUs, and 1.6 times as many cores as the largest supercomputer in the world, Frontier, which has 8.7 million cores (each Frontier core is more powerful).
“We’re better than Frontier at AI. And this is designed to give you an idea of the scope of the achievement,” he said. “When you program on Frontier, it takes years for you to design your code for it. And we were up and running without any code changes in 10 minutes. And that is pretty darn cool.”
In the pictures, the individual computers within Andromeda are still huge because the top section is for input/output, and it needs support for 1,200 gigabit Ethernet links, power supplies and cooling pumps.
AMD is one of Cerebras’ partners on the project. Just to feed the 13.5 million cores with data, the system needs 18,176 3rd Gen AMD Epyc processors.
Linear scaling
Cerebras says its system scales. That means that as you add more computers, the performance of software goes up by a proportional amount.
Unlike any known GPU-based cluster, Andromeda delivers near-perfect scaling via simple data parallelism across GPT-class LLMs, including GPT-3, GPT-J and GPT-NeoX, Cerebras said. The scaling means that the application performance doesn’t drop off as the number of cores increases, Feldman said.
Near-perfect scaling means that as additional CS-2s are used, training time is reduced in near-perfect proportion. This includes LLMs with very large sequence lengths, a task that is impossible to achieve on GPUs, Feldman said.
In fact, GPU-impossible work was demonstrated by one of Andromeda’s first users, who achieved near-perfect scaling on GPT-J at 2.5 billion and 25 billion parameters with long sequence lengths — MSL of 10,240, Feldman said. The users attempted to do the same work on Polaris, a 2,000 Nvidia A100 cluster, and the GPUs were unable to do the work because of GPU memory and memory bandwidth limitations, he said.
Andromeda delivers near-perfect linear scaling from one to 16 Cerebras CS-2s. As additional CS-2s are used, throughput increases linearly, and training time decreases in almost perfect proportion.
“That’s unheard of in the computer industry. And what that means is if you add systems, the time to train is reduced proportionally,” Feldman said.
Access to Andromeda is available now, and customers and academic researchers are already running real workloads and deriving value from the leading AI supercomputer’s extraordinary capabilities.
“In collaboration with Cerebras researchers, our team at Argonne has completed pioneering work on gene transformers – work that is a finalist for the ACM Gordon Bell Special Prize for HPC-Based COVID-19 Research. Using GPT3-XL, we put the entire COVID-19 genome into the sequence window, and Andromeda ran our unique genetic workload with long sequence lengths (MSL of 10K) across 1, 2, 4, 8 and 16 nodes, with near-perfect linear scaling,” said Rick Stevens, associate lab director at Argonne National Laboratory, in a statement.
“Linear scaling is amongst the most sought-after characteristics of a big cluster, and Cerebras’ Andromeda delivered 15.87 times throughput across 16 CS-2 systems, compared to a single CS-2, and a reduction in training time to match. Andromeda sets a new bar for AI accelerator performance.”
Jasper AI also used it
“Jasper uses large language models to write copy for marketing, ads, books, and more,” said Dave Rogenmoser, CEO of Jasper AI, in a statement. “We have over 85,000 customers who use our models to generate moving content and ideas. Given our large and growing customer base, we’re exploring testing and scaling models fit to each customer and their use cases. Creating complex new AI systems and bringing it to customers at increasing levels of granularity demands a lot from our infrastructure. We are thrilled to partner with Cerebras and leverage Andromeda’s performance and near-perfect scaling without traditional distributed computing and parallel programming pains to design and optimize our next set of models.”
AMD also offered a comment.
“AMD is investing in technology that will pave the way for pervasive AI, unlocking new efficiency and agility abilities for businesses,” said Kumaran Siva, corporate vice president of software and systems business development at AMD, in a statement. “The combination of the Cerebras Andromeda AI supercomputer and a data pre-processing pipeline powered by AMD EPYC-powered servers together will put more capacity in the hands of researchers and support faster and deeper AI capabilities.”
And Mateo Espinosa, doctoral candidate at the University of Cambridge in the United Kingdom, said in a statement, “It is extraordinary that Cerebras provided graduate students with free access to a cluster this big. Andromeda delivers 13.5 million AI cores and near-perfect linear scaling across the largest language models, without the pain of distributed compute and parallel programming. This is every ML graduate student’s dream.”
The 16 CS-2s powering Andromeda run in a strictly data parallel mode, enabling simple and easy model distribution, and single-keystroke scaling from 1 to 16 CS-2s. In fact, sending AI jobs to Andromeda can be done quickly and painlessly from a Jupyter notebook, and users can switch from one model to another with a few keystrokes.
Andromeda’s 16 CS-2s were assembled in only three days, without any changes to the code, and immediately thereafter workloads scaled linearly across all 16 systems, Feldman said. And because the Cerebras WSE-2 processor, at the heart of its CS-2s, has 1,000 times more memory bandwidth than a GPU, Andromeda can harvest structured and unstructured sparsity as well as static and dynamic sparsity. These are things other hardware accelerators, including GPUs, simply can’t do.
“The Andromeda AI supercomputer is huge, but it is also extremely power-efficient. Cerebras stood this up themselves in a matter of hours, and now we will learn a great deal about the capabilities of this architecture at scale,” said Karl Freund founder and principal analyst at Cambrian AI.
The result is that Cerebras can train models in excess of 90% sparse to extreme accuracy, Feldman said. Andromeda can be used simultaneously by multiple users. Users can easily specify how many of Andromeda’s CS-2s they want to use within seconds. This means Andromeda can be used as a 16 CS-2 supercomputer cluster for a single user working on a single job, or 16 individual CS-2 systems for 16 distinct users with 16 distinct jobs, or any combination in between.
Andromeda is deployed in Santa Clara, California, in 16 racks at Colovore, a high-performance data center. Current Cerebras customers include Argonne National Labs, the National Energy Technology Labs, Glaxo, Sandia National Laboratories, and more. The company has 400 people.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.
Author: Dean Takahashi
Source: Venturebeat