Tesla held their AI Day 2022 event in Palo Alto, California earlier tonight, going into a lot of detail on their latest developments in the world of AI. Part of the discussion centered around Full Self-Driving (FSD). Elon Musk warned before the event it was going to be very technical, and it didn’t not disappoint.
Here we will look at the Full Self Driving section of Tesla’s AI Day 2022 while trying to simplify the concepts.
Tesla started the discussion with numbers. Tesla has so far created 35 releases with 281 different training models. The more interesting number shared are the total pull requests (total number of times code was merged together) 18,659.
FSD Beta has used a total of 4.8 million data set.
FSD Blocks
Tesla then moved on to a flowchart which would show how they would cover different topics within the Full Self Driving segment of Tesla AI Day 2022. The training data which can be auto labeled data, simulated data or from the data engine, are fed into different Neural Networks which go to planning.
Each was covered more in-depth later throughout the conference.
Planning Section
This Neural Network decides things like gap control. Imagine you are making a left in an intersection. There is a pedestrian walking across. When is it safe for the car to go? Think of planning as making decisions (there is much more to it but we will simplify it).
Tesla uses something they call “interaction search”. It Looks at lanes, occupancy (what is happening in those lanes), and other moving objects. The first layer of the neural network looks at the lane. What is the lane like?
It then branches (looks at) unobstructed seeds (who is occupying the lane) and branches to interactions within the lanes; pedestrians or objects.
It then plans on how likely you are to intervene. There are checks that occur such as will you get in a collision? Is there any conflicts with the data?
Occupancy Network
This network figures out curbs, cars, debris on the road, and just general predictions on where things are going. Instead of using just a basic object network, this network creates a driveable surface; where the car can drive.
They use camera images with raw photos, not standard RGB. They extract the features from the photo and create 3D module with spatial features. It than goes through “Deconvolution” for a final output. Tesla wanted a higher resolution output so they are using “Queryable Outputs” and using “NeRFs” where they can create 3D environments with 2D images.
Tesla also uses Auto labeled Datasets. Every second they capture 400,000 videos. Using custom Pytorch (extremely popular machine learning framework) extension, data goes from storage to GPU for training. (I am not well versed in hardware). They verify with ground truth.
Lane Prediction
They need to use predictions since sometimes you can’t see things on the other side of the intersection. There is a “vision component” which provides input data. Tesla than adds a map component. It is road map data with topography information. They made sure to note that it is not HD Maps; so for example they will not know when the lane ends ahead of time. Tesla then adds something called the “language” component. It is lane positions in 3D space. The language component has a prediction grid which maps out all the lanes in the given 3D space. It runs through over and over until the end of the segment. This lane prediction is needed, especially since you and the Tesla might not see the road clearly. You need predictions.
Auto labeling
Tesla is currently limited to a 500k intersection cache per day (data storage limits). They want to get to 1 billion intersections. Obviously the math doesn’t work out for their goal vs their limits. They now use “reconstruction” which scales better and labels faster than the 2020 approach. They use auto labelers for almost all tasks for their planning. It will even auto label in different weather conditions such as in the dark, rain or fog.
Simulation
There are situations where it is hard to get real world data. This is where simulations are used. Imagine it is like creating a scene in a video game and then using that data to help fix obscure things that occur in driving. The steps Tesla uses to create these simulations are;
- Ground truth labels -> creates line data and slopes
- Than add lane paint details
- Spawn random characteristics
- map data to add traffic data
- Lane connectivity -> road signs
- Add random traffic
These are all automatic and can be set up in under 5 minutes. It allows Tesla to do material swapping or changing the location, creating an infinite number of scenes to make new ground truth. Create new flows to make predictions to create data that cant be made from the real world.
Data Engine
There are challenge cases when driving. Tesla can build an evaluation set with test videos. They can collect specific data from Tesla’s driving around that encounter the exact thing they are working on. Then train, fix and test.
If you want all the details, you can watch the AI Day 2022 presentation below. The section on FSD begins at the 58 minute mark.