Tesla AI Day is meant to attract both hardware and software developers. This leads to the event being very technical heavy. I have tried my best to simplify some of the concepts that Tesla announced during AI day.
Hydranets
When you drive your Tesla on Autopilot, it sees things around itself and has to make decisions. There is object detection; other cars and people in the path. There are traffic lights and stop signs that it must obey. There are lane markings that it has to stay within. These are just some examples of individual tasks that your car must understand.
Karpathy started by talking about Hydranets. Tesla uses Hydranets for each task that the car needs to achieve. There would be a Hydranet for all the tasks mentioned before. Each Hydranet can talk to each other and share features however they all run separately. This allows Tesla to do better and faster fine tuning.
Vector space
Tesla has individual cameras that all collect data. They combine all the cameras together to get a full view around the car. This isn’t as simple as stitching all the videos together. The cameras are at different angles which they need to consider when getting a full 3D view. Once done though, Tesla is able to determine where objects are at any given pixel around the car.
Video Module
When you are driving, you know where other cars are located around you. You also remember where road markings and signs are placed. However it is hard for your car to remember all of these things, all the time. There is a “memory” limit. Tesla created video modules.
When your car needs help remembering things, it can reach out to the neural network utilizing the video module. The video module improves the ability to see depth and velocity.
Predictions
All data that your car collects can be sent through different modules in the neural network. This all helps your car make predictions. These predictions range from what other cars are doing around you to where your car should be on the road. within the predictions is a “collision risk rate”. As it decides on each choice, it determines the chance that an accident happens.
Labeling Data
So before we get into labeling data, let’s cover a couple of concepts. A Neural Network needs data in order to function and work. It won’t just instantly know and understand what lane markings or stop signs are. It is like teaching your kid what a dog is compared to a cat. They need to see images (and real life examples) of both to understand and title (or label) each correctly. They won’t instantly know. Tesla currently has 1000 people in house labeling information.
Fun fact “Google does use reCaptcha to teach its self-driving Waymo cars to label images so they can, for example, tell the back of an Escalade from an empty patch of asphalt”. (Source). So there is a chance that you have helped label data!
Additionally there is neural net nodes. Data is sent through these nodes and determines what should happen. There is a lot of nodes, 288 to be exact. As data travels around, each node helps makes the “prediction” on what to do better.
Tesla started out by labeling 2D images. So if your car drives by a traffic cone, then they can label it accordingly. However in a 3D world, this was not efficient. So they labeled data in vector space. This is just a fancy way of saying they labeled the data in full 3D. This made it much faster to implement and use the data to train the neural network.
However manual labeling is time consuming. In order to label at scale, they are using AI and predictions to label data. If you are interested; there is a really interesting white paper about Batch Active Learning at Scale.
They can use different video clips from the same location from different cars. Each car is sending data back to the neural network to help auto label. By having a large fleet that is able to send data back to Tesla, it makes it easy to collect exact information they need at any given time. As labels are created over and over again, Tesla can utilize these labels for similar places the car has never driven before. While interesting, you need to remember these changes and labeling does not equate to instant changes on your drive.
Simulation
Tesla uses “Autopilot simulation”. Simulated drives are more or less a Tesla driving around in a video game. Imagine having your Tesla dropped in grand theft auto with FSD. It uses the nueral network and learns to obey the laws (not run people over). If the simulation is close to real driving, then the data that Tesla can collect on a simulated drive is very useful.
Simulated data is good for;
- Situations that are hard to replicate in real world data. Like someone walking there frunkpuppy on the highway or even a traffic accident. A simulation of both is much safer then trying to replicate the actual thing.
- Simulated data is great for really difficult to label data. In the example Tesla shared, there was a ton of people walking by. They labeled each and everyone.
Dojo Super Computer
Tesla uses a 7 nanometer processor with 362 teraflops of processing power. This chip on it’s own is very fast. They take 25 of these chips and place them onto a single “training tile”. 120 of these tiles work together providing over an exaflop of power.
These chips are used to train data sent from your Teslas video feed to the neural network. Having a super fast computer be able to process all this information means they will be able to process more information and quicker. “We should have Dojo operational next year,” Elon Musk said.
Moving Forward
Tesla annouced the new “Tesla Bot” that will soon write these articles. Okay not really but it is impressive to see how much data Tesla is able to collect and utilize for their neural network. There is a ton of data being inputted all at once. Tesla is currently making minor changes to the neural network based on Hydranets, labeled data, simulation data, etc. There is a total of roughly ~1 million evaluations per week. This is the number of fine tune changes. That is a lot!
If reading all this got you wanting to watch the livestream (again), here you go.