The Jockey Club

Resources

Improving Handle: Using AI to Optimize Horse Racing Schedules

John Stewart, Chief Executive Officer / Cofounder, Fastbreak AI Dr. Ryan Kelley, Principal Data Scientist and ML Engineer, Fastbreak AI
Stewart: My name is John Stewart. I’m the CEO and one of the co-founders of Fastbreak AI, along with Dr. Chris Groer and Dr. Tim Carnes. Three years ago, my partners and I founded Fastbreak AI to solve one of the most complex combinatorial problems in sports, optimizing schedules through advanced machine learning and data science techniques. Today our team includes nine experts with PhDs in mathematics, operations research, particle physics, and data science making us the deepest bench of sports scheduling specialists in the world. Fastbreak AI is a sports technology data science analytics provider for every level of sports from professional to youth. Our clients include the NFL, NBA, NHL, MLB, MLS, and many other professional collegiate and elite youth travel tournaments around the world. About three months ago we were approached by The Jockey Club to evaluate whether our expertise could be applied to horse race scheduling with the objective of increasing handle. Today, Dr. Ryan Kelley will present the team’s findings to the members of this esteemed body. We believe our AI models reveal compelling opportunities to have a significant financial impact on horse racing through the development of a scheduling platform tailored to your industry’s needs and facilitating cooperation among the racetracks. The primary focus of this study was to optimize post times to minimize overlap and maximize handle. However, we believe our expertise could also be utilized to improve race-day scheduling, the sequence of races, and maybe even the scheduling of some of your most important events. Now, before I turn the presentation over to Ryan to discuss the feasibility study and our findings, I’d like to thank The Jockey Club board of stewards for the opportunity to study the problem and participate in your round table. We’d also like to thank all of you for your attention today. In particular, we’d like to thank Everett and Jim for engaging Fastbreak, educating our team on your magnificent sport, and providing the guidance and data that made our work possible. And now I will turn the presentation over to Ryan. Kelley: Thank you, John. So this has been a really fascinating project. We’ve been working on it now for about three months and here at Fastbreak we have the expertise in scheduling software for sports, but you guys in this room are the experts on horse racing. We don’t know the field like you do, so we’re going to rely on your expertise to help us build a better schedule. So the goal of this feasibility study was to capture the industry’s intuition, scientifically, and fold that into building an optimized schedule to increase the handle. So here’s what we did for this exercise is we want to increase the handle, but there’s different things that influence that handle such as the track, the purse, the sequencing of races, things of that nature. But for this study we focused on the thing that we could change in the data, which is the schedule itself. So we built a model to kind of estimate the influences using machine learning, and we folded that into the optimizer, and we executed this study with the 20 years of data that The Jockey Club provided us. It’s race level data. And what we demonstrated is that with our models we can predict a lift in total handle across all the tracks. So, we tried to model this influence. So on the visualization on the right you see a Gantt chart of 20 races from a random race day in 2022. So if the vertical axis represents an individual race and the horizontal axis represents time and each marker is an individual race. So it becomes obvious when you view the data in this way that a lot of the races are run very close together or even overlap with each other. So what we wanted to do is kind of capture that characteristic scientifically. So in the visualization on the left, we kind of define what we mean. So the blue race, which is $1,200, is considered overlapping with these other two races because they start within X minutes of that given race. And X is a tunable parameter, so we actually set it for five minutes for this study. So in this case it’s going to be overlapped with the $500 race and the $1,000 race giving us two races that overlap. The $700 race started a lot later and therefore it’s kind of standalone and it doesn’t overlap with any other races. So what we did is we didn’t just take the fraction, we actually weighted the fraction by the purse because the purse is kind of a proxy to the importance of the race. So when we do that, we create this variable that we called overlap and we use it to, we fold it into the optimizer so that we can reduce the overlap to increase the intention on a given race. So continuing with this overlap concept, we plot it versus the average handle on a given race. And you can see that on the histogram on the right. And so the $1,200 race would have a higher overlap, so it would fall further right on the plot and thus have a lower handle, and the $700 race, which is standalone, would fall to the left of the plot and have a higher handle. So if we reduce the overlap, the main idea of this study is that the handle increases. So what we did is we built a multivariate nonlinear regression model to predict the handle, and some of the key features that went into this model are the track, the purse, the number of runners, the full list is right here on the slide and one of the key features is overlap. And so this list of features is actually in order of importance to the model. Now notice overlap is not the most important feature and the reason that is, is we didn’t turn the knobs on any of those other features. We basically held those all fixed and we just turned the knob on overlap. And the reason we did that is we’re trying to find kind of a needle in a haystack. We’re trying to squeeze out a little bit more handle. So even though it’s not the most powerful feature, it’s one that we can control in the schedule and that we can kind of tune. So on this visualization, we’re showing a schedule on a particular day. And again the markers represent races, the black markers are the original post times and then we did our optimization where we tried to minimize the overlap or didn’t try, we actually minimized the overlap and that represents the red markers. So you’ll notice that the characteristics of the schedule didn’t change very much. However, by optimizing the time, we were able to reduce the overlap for all these races by 20%. And when we ran it through the model, it showed that we predicted an increase in handle by 3% for this given day. So we basically did that same exercise over all of 2024. So again, we compared the prediction of the schedule before we ran the optimization and after. And what this shows is that we had an increase in the lift in total handle of 3%, which mapped to about $360 million, give or take. So this is a fairly significant increase in handle just from a simple tweak in the schedule, which is to reduce the overlap. Now a lot of consideration has already gone into the post times, it’s already gone into the schedule, but things come up on race day. So in this toy example here on this slide, we see that to the left of the red line was the original post times and off times – post times being black off times being blue. And in track D we see a 25-minute delay for whatever reason, it could have been weather, it could have been something, some issue with the gate. And so in real time, and it only takes a few seconds, you can reoptimize the schedule. And so everything to the right shows in red the optimized schedule. And when we compare the prediction on the handle for the original off times versus the optimized time, we see an increase in handle, not just for track D but also all the tracks. So this represents one of the main use cases is that we can keep the schedule optimum throughout a race day. And then in this scenario, this is our final scenario, is tracks are under no obligation to use this recommended schedule. So we did an exercise where we considered one high profile track that abstained from using the schedule. So what this showed is even though in both cases there’s a lift in handle, the lift is lower when one track abstained. So what this demonstrates is that all tracks can be affected by when one or more tracks abstains from using the schedule. So on this slide we have a mockup of a platform. So this would be a real time race scheduling platform and it would provide an optimized schedule and it would auto ingest off time so that we could keep the schedule optimum throughout the day to react to transient events like I just showed on the previous slide. And it also provides a centralized view so that you can see what’s going on across all the tracks simultaneously. So in this example we have the race in red, which was delayed, and then all the other races after it in orange were moved later in time to keep the schedule optimal. So in summary, many races are overlapping, and this causes contention for eyeballs and thus lowers any potential handle. So if we optimize the schedule and reduce this overlap, this experiment shows that it should increase the handle. So here at Fastbreak AI, we have the expertise to take this concept and fold it into an optimizer so that we can keep the handle maximum across all the tracks in the industry. Thank you very much. VOG: Thank you, John, and Dr. Ryan. Please welcome our next speaker, Steve Kornacki, chief data analyst for NBC News.