Tesla has unveiled its new supercomputer, which is already the fifth most powerful in the world, and it’s going to be the predecessor of Tesla’s upcoming new Dojo supercomputer.
It is being used to train the neural nets powering Tesla’s Autopilot and upcoming self-driving AI.
Over the last few years, Tesla has had a clear focus on computing power both inside and outside its vehicles.
Inside, it needs computers powerful enough to run its self-driving software, and outside, it needs supercomputers to train its self-driving software powered by neural nets that are fed an insane amount of data coming from the fleet.
CEO Elon Musk has been teasing Tesla’s Dojo project, which apparently consists of a supercomputer capable of an exaFLOP, one quintillion (1018) floating-point operations per second, or 1,000 petaFLOPS – making it one of the most powerful computers in the world.
Tesla has been working on Dojo for the last few years, and Musk has been hinting that it should be ready by the end of this year.
But the company has developed other supercomputers on its way to Dojo, and now Andrej Karpathy, Tesla’s head of AI, has unveiled the latest one during a presentation at the 2021 Conference on Computer Vision and Pattern Recognition.
During the presentation, Karpathy gave a shoutout to Tesla’s supercomputing team and showcased their latest work, Tesla’s third supercomputer cluster:
Tesla is claiming some fairly insane specs on this new cluster, which should make it roughly the fifth most-powerful computer in the world:
- 720 nodes of 8x A100 80GB. (5760 GPUs total)
- 1.8 EFLOPS (720 nodes * 312 TFLOPS-FP16-A100 * 8 gpu/nodes)
- 10 PB of “hot tier” NVME storage @ 1.6 TBps
- 640 Tbps of total switching capacity
Karpathy commented on the effort:
“We have a neural net architecture network and we have a data set, a 1.5 petabytes data set that requires a huge amount of computing. So I wanted to give a plug to this insane supercomputer that we are building and using now. For us, computer vision is the bread and butter of what we do and what enables Autopilot. And for that to work really well, we need to master the data from the fleet, and train massive neural nets and experiment a lot. So we invested a lot into the compute. In this case, we have a cluster that we built with 720 nodes of 8x A100 of the 80GB version. So this is a massive supercomputer. I actually think that in terms of flops, it’s roughly the number 5 supercomputer in the world.”
The Tesla engineer didn’t want to elaborate on project Dojo, but he did say that it will be an even better supercomputer optimized for neural net training than Tesla’s current cluster.
Musk also previously said that Tesla plans to eventually make its supercomputers available to other companies in order for them to train their neural nets on it.
Here’s Karpathy’s presentation at CCVPR 2021:
Subscribe to Electrek on YouTube for exclusive videos and subscribe to the podcast.
Author: Fred Lambert
Source: Electrek