On-Prem

HPC

Tesla's Dojo supercomputer is a billion-dollar bet to make AI better at driving than humans

More data means better neural net training, but it also means more cores


Tesla says it is spending upwards of $1 billion on its Dojo supercomputer between now and the end of 2024 to help develop autonomous vehicle software.

Dojo was first mentioned by CEO Elon Musk during a Tesla investor day in 2019. It was built specifically for training machine learning models needed for video processing and recognition to enable the vehicles to be self-driving.

During Tesla's Q2 earnings call this week, Musk said Tesla was not going to be "open loop" on its Dojo expenditure, but the sum involved would certainly be "north of a billion through the end of next year."

"In order to copy us, you would also need to spend billions of dollars on training compute," Musk claimed, saying that developing a reliable autonomous driving system is "one of the hottest problems ever."

"You need the data and you need the training computers, the things needed to actually achieve this at scale toward a generalized solution for autonomy."

Musk pointed out that training complex machine learning models needs huge volumes of data, the more the better, and this is what Tesla has access to, thanks to all the telemetry from its vehicles.

"With respect to Autopilot and Dojo, in order to build autonomy, we obviously need to train our neural net with data from millions of vehicles. This has been proven over and over again, the more training data you have, the better the results," he said.

"It barely works at 2 million [training examples]. At 3 million, it's like, wow, OK, we're seeing something. But then, you get to, like, 10 million training examples, it becomes incredible. So there's just no substitute for massive amount of data. And obviously, Tesla has more vehicles on the road collecting this data than all of the other companies combined. I think maybe even an order of magnitude," Musk claimed.

On the Dojo system itself, Musk said it was designed to significantly reduce the cost of neural net training, and has been "somewhat optimized" for the kind of training that Tesla requires, which is video training.

"We see a demand for really vast training resources. And we think we may reach in-house neural net training capability of 100 exaFLOPS by the end of next year," Musk claimed, which is quite a lot of compute power, to put it mildly.

Dojo is based largely on Tesla's own technology, starting with the D1 chip that comprises 354 custom CPU cores. Twenty-five of these D1 chips are interlinked into a 5x5 array inside a "training tile" module, building up to the base Dojo V1 configuration featuring 53,100 D1 cores, according to our colleagues at The Next Platform.

Musk believes that with all of the training data and a "high-efficiency inference computer" in the car, Tesla's autonomous driving system will soon make its vehicles not just as proficient as a human driver, but eventually much better. When? He didn't say and has form in making grand claims.

"To date, over 300 million miles have been driven using FSD [Full Self-Driving] Beta. That 300-million-mile number is going to seem very small, very quickly. And FSD will go from being as good as a human to then being vastly better than a human. We see a clear path to full self-driving being 10 times safer than the average human driver," he claimed.

This is important, Musk explained, because "right now, I believe there's something in the order of a million automotive deaths per year. And if you're 10 times better than a human, that would still mean 100,000 deaths, So, it's like, we'd rather be a hundred times better, and we want to achieve as perfect a safety as possible."

Dojo is not the only supercomputer Tesla has for video training. The company also built a compute cluster equipped with 5,760 Nvidia A100 GPUs, but Musk said they simply couldn't get enough GPUs for the task.

"We'll actually take the hardware as fast as Nvidia will deliver it to us," he said, adding: "If they could deliver us enough GPUs, we might not need Dojo, but they can't because they've got so many customers." ®

Send us news
43 Comments

Tech world forms AI Alliance to promote open and responsible AI

Everyone from Linux Foundation to NASA and Intel ... but some big names in AI are MIA

Creating a single AI-generated image needs as much power as charging your smartphone

PLUS: Microsoft to invest £2.5B in UK datacenters to power AI, and more

Tesla says California's Autopilot action violates its free speech rights

Elon's biz claims 1st Amendment rights

Mere minority of orgs put GenAI in production after year of hype

Folks are dipping their toes in without a full commitment

Don't be fooled: Google faked its Gemini AI voice demo

PLUS: The AI companies that will use AMD's latest GPUs, and more

Trust us, says EU, our AI Act will make AI trustworthy by banning the nasty ones

Big Tech plays the 'this might hurt innovation' card for rules that bar predictive policing, workplace emotion assessments

Exposed Hugging Face API tokens offered full access to Meta's Llama 2

With more than 1,500 tokens exposed, research highlights importance of securing supply chains in AI and ML

The AI everything show continues at AWS: Generate SQL from text, vector search, and more

Invisible watermarks on AI-generated images? Sure. But major tools in the stack matter most

HPE targets enterprises with Nvidia-powered platform for tuning AI

'We feel like enterprises are either going to become AI powered, or they're going to become obsolete'

AWS unveils core-packed Graviton4 and beefier Trainium accelerators for AI

Also hedging its bets with a healthy dose of Nvidia chips too

EU running in circles trying to get AI Act out the door

Bloc risks missing out on first-to-legislate status if timetable slips

Google launches Gemini AI systems, claims it's beating OpenAI and others - mostly

Gemini accepts text, images, audio, and video and comes in three flavors