Project Goals
The goal of this project was to take the default racing map that Unreal Engine 5 offers, and implement a vehicle AI using Reinforcement Learning that would learn the optimal racing line to maximize its lap time around this track. The way it would manipulate the racing line was by using a spline that contained a series of points that it could modify in segments and then the vehicle would follow this spline. This will be explained in further detail to follow. This combines several challenges such as implementing driving physics, implementing a way to actually run a lap, implementing a way of representing this as an RL problem, and much much more.
Project Structure
Actors
The first actor I worked on was the AIController_Vehicle. This actor and its associated blueprints and blueprint interfaces were responsible for the physical driving of the vehicle. Four primary functions were implemented to control this behavior:
- Calculate Steering: This function was responsible for figuring out where the closest point on the spline was (with an offset based on how fast the car was traveling – so the faster the car is moving, the further ahead you want to look). After this was calculated, it performs some quaternion math to figure out the direction it needs to head, and the amount of steering input it needs to accomplish this (without overcorrecting).
- Calculate Top Speed: This function was similar to the previous one except its purpose is to calculate the maximum speed it can go based on the amount of turning it needs to accomplish in order to follow the spline.
- Calculate Throttle/Brake: This function was responsible for figuring out how much throttle input (based on the previously calculated top speed) and brake input to apply in order for the vehicle in order to follow the spline.
In addition to just these blueprints, this actor had the ReinforcementLearningAI C++ class (discussed later) and the LapCompoennt C++ class. The LapComponent class was responsible for controlling the run of a single lap of the AI vehicle. It would time the lap, teleport the car to the correct location to start the lap, figure out if the AI had gone out of bounds, had crashed, or had taken too long to complete a lap, and then it would score the lap after it figured out if the lap was over. The tricky part here was figuring out how to determine when a lap was started and when the lap ended – I created several enums and complex functions to accomplish this task. In addition, this class also does the majority of the visualization that can be seen in the demo including the bounding boxes.
The second actor I worked on was BP_AIPath. This actor had two components attached to it – the spline which the vehicle would follow, and the SplineController which would generate the spline points that make up the spline. The SplineController component was implemented as a C++ class. Without going into too much detail, I will describe what the class accomplishes. The first thing it needs to do is generate a list of points in 3D space that could be used to make up the spline. This is no easy feat, as I had to first figure out how to grab the points from the LandScapeMesh that actually made up the track, convert this points into a different worldspace, and then add new points to create 5 distinct points for each center point in order to have 5 different possible racing lines for each point. In addition, because the segments in the actual track are completely out of order, a manual process sorts the different segments according to their order in the track. After all the points have been gathered, this component generates the list of points by individual segments of the track (so the RL algorithm can work on smaller subsets of the track at once). Below is an image that shows the different segments, and the segment groups (the RL algorithm works on just 1 segment group at once to maximize that individual racing line).
This image below shows all the available points that the SplineController generates (they are in red).
It was incredibly painstaking and difficult to figure out these points and it involved a ton of frustrating vector math and digging through unreal engine documentation – this might have been the most time consuming part of the project.
The third actor I worked on was the SimulationController. Attached to the SimulationController is the SimulationControlScript. For the sake of brevity – I won’t go over all that makes up this class and the various functions but I will explain the main purpose. This class was responsible for calling and manipulating the other classes. This class runs the entire simulation and every other class. It begins by calling the RL algorithm and gives it the segments it can modify for the first segment group. After the RL algorithm generates the racing lines it wants to use for each segment in this segment group for a given epoch, it passes this info back to the this class, which uses that to generate the spline that will be used for this lap, then it tells the lap component to actually start the lap (using a complicated series of delegates because of the asynchronous nature of doing a lap without just stopping execution in the current script). After the lap completes, it takes the score, passes it back to the RL algorithm so it can update its Q table. After x number of epochs for each segment group, the controller iterates on to the next segment group and the process repeats until all segments groups have been completed and a final racing line exists for the entire lap.
Results
So, after adding some obstacles to make this more interesting, here is the final racing line the algorithm choose:
The final lap time from this racing line was a decrease of 11 seconds from just using the center line, so it turns out the algorithm is better than the default. This result was generated by setting epsilon to 0.5 (equal chance to be greedy vs. conservative) and each segme nt group had 100 epochs. The total training time was around 7 hours. In the video attached to this project you can see the different debugging messages to get a better sense of what is going on each lap and to get a better sense of the total training process.
The results were less important than just getting the algorithm to work, frankly I didn’t really expect it to even do better, so I am pleasantly surprised.
Challenges/Future Work
Given how many times I changed the project structure and how the RL algorithm worked, a lot of other stuff was changed as well. This means that there is repeated code and functions that are no longer as optimized as they once were. I need to go over and fix all of these things (I just ran out of time).
One of my biggest challenges was getting delegates to work. Because I couldn’t just call a function and have it wait until a lap was completed (because that would freeze this thread’s execution), I had to set up delegates that would communicate with each other once a given lap was completed. This was hard.
In terms of future work, I want the RL algorithm to be able to modify the car’s driving characteristics as well. All the racing lines generated used specific values that determined how the car drived, so these can be optimized together with the racing lines to make the car go around the track even quicker. I also need to add setters/getters and make more of the methods protected. In development, I just wanted everything to be public so I could work with it from any class, this needs to change.
Lastly in terms of running it – all parameters can be tweaked in the code or in the editor for the RL algorithm, or the number of epochs. You might have to try to launch it twice to get around the warnings, but the final version committed to github is working.