Analytics in Sport
Michael Lopez
Michael Lopez, Director of Football Data and Analytics, National Football League

STUART S. JANNEY III: I agree. That was a wonderful discussion.

Our final speaker this morning is Mike Lopez. As a former statistics professor right here at Skidmore College, Mike is not a stranger to this area. Today Mike is the director of football data and analytics for the NFL, and he is going to discuss how analytics can benefit the sport of horse racing.

So welcome, Mike.

MICHAEL LOPEZ: Thanks, Stuart. It’s interesting you flip through the program and you see all these successful trainers and bettors, and, I mean, Lindsay’s husband even won a trifecta. Doesn’t show that my NYRA account is at 66 cents.

Last March in a meeting with the NFL’s competition committee, I was tasked with explaining somewhat of a niche adjustment to the punt play. Specific suggestions dealt with where punts would be placed after touchbacks. It was one of the ideas that our data group had come up with.

I had four or five minutes to make the case why this would benefit the NFL through competitiveness, health, and safety impact, and make it easier on our officials.

When I was done, I turned it over to the chairman, Rich McKay, to add anything, and he said, well, committee, what do we think? There was kind of an awkward pause, and I was anxious to hear the committee’s response, because, again, this was our idea.

Everyone was looking around. Titans Coach Mike Vrabel broke the silence. He said, well, this is not the worst idea I ever heard of.

Here I am, purveyor of ideas that will not be the worst out there.

On that note, thanks to The Jockey Club for the invitation to come here and talk analytics. This spot is one I’ve been before. Like Stuart said, I taught up the road at Skidmore.

But, also, in thinking about how analytical tools can help decision makers in sports. My job at the NFL league office is pretty awesome: Use data to enhance the game. Football has long been my passion. I grew up the son of a longtime high school football coach. Played collegiately as well.

You can have hobbies, and for the last decade I count horse racing as one of mine. As Saratoga residents, my wife and I had summer passes to the track. We would pick out shaded picnic tables along the walking path with the pink tablecloth and bet the 8 horse because it had a pink saddle.

My oldest daughter’s name is Lila, and that’s relevant because our biggest score came on the Labor Day finale back in 2016. Brad Cox had a filly named Sassy Little Lila. I thought I was done betting for the season, but of course Lila is a real name and I’m a pushover, so here we were at the track on the final race of the season. Sassy Little Lila won, and the exacta paid a couple hundred bucks, and the Lopez Family had a new favorite horse.

With football as my passion and horse racing as a moderately expensive hobby, my goal today is to share some of the ways the NFL is using data and analytics to make better decisions.

Before starting into that, a little bit of background might help. Football and horse racing are pretty similar, in that for years the data of the sport did not change much. In the NFL that means about 150 rows of data, one for each play. For example, in the ’70s you would have Terry Bradshaw threw a deep pass to Lynn Swann for a touchdown.

In 2010, you would have Tom Brady threw a pass to Rob Gronkowski for a touchdown. You knew that Bradshaw or Brady threw the pass and that Gronkowski or Lynn Swann caught it, but you didn’t know much more than that.

To be honest, a lot of times you didn’t even know the other 20 players on the field. For me as an offensive lineman that was kind of a pain in the neck. I would play 75 plays in a game perfectly, and the only time I would show up is if I recovered a fumble.

Without much to glean from box scores, coaches would rely on film. They would watch film before practice. They would watch after practice. They watch film with themselves, film of their opponents, film of themselves last year against their opponents.

From that film they would collect loads of charting data, nearly identical to how horse racers would watch video or collect things at the track. They chart who was on the field, formations, tendencies, splits, personnel evaluations. You name it, they were writing it down.

When scouting players, every scout would give a player a grade on things like stamina or change in direction or a player’s closing speed. When they would run practice they would have someone running the scout team that would draw lines about where each player would go.

So, you would have an assistant coach that had to take an Expo marker and write down what the opposing team was going to do. When the sport started changing and we got more data, the core scouting and core player evaluation has remained somewhat consistent. Coaches are still asking the same questions they were decades ago.

But here is the big change, is that we’re no longer limited to the same data. The biggest jump came in around 2015 when the NFL took a big step towards modernizing how teams in the league itself can operate via Next Gen Stats player tracking.

Each player is now equipped with a pair of RFID tags — they are roughly the shape of your thumbnail — that fit in each of their right and left shoulder pads. That chip turns on when they walk on the practice or game field. It emits their location, their speed, their direction and orientation at 10 frames per second.

So as a rough estimate, when we have about 150 rows of data for ’70s, ’80s, ’90s, and 2000s games, we’re now closer to somewhere around 300,000 for each player in a game. It’s a big difference in terms of the size and the scope of what we’re able to learn.

What we’re trying to do is still the same. That metric that the scout would give for player stamina, you can get that using the player tracking. Change of direction? Yep, we’re going to measure that, too. Same with closing speed.

The tendencies for your opponents that used to have the coach write down with an Expo marker, most clubs have automated that process and will print out hundreds of plays in a matter of minutes.

So, the goals of the coaches and the scouts are the same, but they are now able to do things faster in large part due to the new data. That data created new opportunity. I’m pretty sure it is the reason I have a job. When the league realized everybody else was going to be using this tracking data, they wanted someone to analyze it as well.

When I started in 2018, we kept a spreadsheet of all the data science and analytics staffers on NFL team staffs. There were somewhere around 60 or 65 at that point. That was five years ago. That list is now up to 140.

So, in roughly five years, NFL teams have doubled their investment in folks that have backgrounds like myself. Titles like director of football analytics, senior data scientist are now routinely popping up on NFL teams.

At the league office it helps change how decisions are made. I will share a few examples. Our most important area of work comes with dealing with the Competition Committee. Whenever a rules process comes up and we’re analyzing a new rule, our group spends some time analyzing many of the factors involved in that decision.

We’ll start with competitiveness. What is the competitive impact of this rule? How will it impact scoring? Will our games stay close? Will it impact the way that the teams have a chance of making the playoffs in terms of keeping all teams alive?

Next is officiating. Can we come up with a rule that our officials can officiate? One of the quotes that the NFL likes to use is that we don’t have a rulebook for each official, we don’t have a rulebook for a Thursday game. We have an NFL rulebook that is equivalent across all games.

Last year when we were sharing some of our officiating data, we have some officials, specific plots that we put up there and maybe suggested that some of the officials had certain tendencies. It was a little bit more information than we were used to sharing, and Coach Tomlin of the Steelers responded, man, none of this shit is new to us. We know this is going on already.

So, again, kind of just sharing stuff that I think people are always asking about, but now we’re putting numbers to it.

Next is health and safety. What will be the impact of this player on player health and safety, both short- and long-term?

Pace of play is another factor that we will analyze. In other words, what is the way that we have a 3:05 game now. If we come up with this rule change maybe it will add more time on replay and maybe that will have a negative impact.

Last two would be simplicity and tradition. Can we get fans to understand what we’re doing, and how will that impact the long-term tendencies of the league.

One example of how we’ve used these patterns is the NFL’s overtime format, which for most folks that are familiar with football, is something you will probably have an opinion on. Two years ago, there was a rules change that dealt with whether or not both teams would possess the ball, and how we balanced it was these six levers.

When people ask me what is my favorite overtime, I tell them, I don’t have a right answer. The right answer depends on these six things. If you want to prioritize competitiveness, that’s going to be one answer. If you want to prioritize health and safety, that’s going to be another one.

So, depending on how you adjust each of those levers will lead you to the right solution on overtime.

Said committee chairman Rich McKay of our research: We now have really good discussion, and when I say in the old days, maybe that was even just 10 years ago, we would have less statistics, no analytics, and we would probably would’ve just watched tape. Today’s environment you get so much more information and it gives you a better understanding of what the issue truly is.

Whatever horse racing’s priorities are, close races, safety of the horses and jockeys, the time in between races, et cetera, analytics will help address all of those problems.

On top of those old questions there are new ones to ask and answer. One large data investment area for the league office over the last decade has been in player health and safety. All NFL player injuries are, as part of the collective bargaining agreement with the NFLPA, tracked and monitored by a third party that’s employed by the league office.

Each trainer, club trainer, enters every NFL player injury, and that allows for an independent voice to analyze the impact of rules changes, equipment modifications, and changes to the game.

It’s not simply counting pulled hamstrings and sprained ankles. It is also about establishing tools to make better decisions in the future. At each practice each club is required to wear tracking devices that give insight into a player’s load, their distance traveled, their speed, which allows sports scientists to evaluate performance, identify if players need to tone it down, or if a player is at heightened said risk of injury.

And more importantly, they’re required to share that information with a third party independently to analyze. More recently all NFL players are also wearing those same chips that were in their shoulder pads, they are now in their helmets and in their cleats.

The burden is heavy. This data is messy and the tracking cumbersome, but the idea of a cleat specific to a running back on turf or a helmet specific to a quarterback who likes to scramble gives the league plenty to work on for the future.

Another area where the league is improving is in the replay process. The 2022 NFL season had both the fewest replay reviews and time spent on replay in over two decades. Most probably wouldn’t know that, because most people don’t really love when a replay happens, for us it’s important that games move fast and that we don’t spend a lot of time wondering what the decision makers are thinking.

Our group hand times every replay. So, we can talk about fancy data, but at the end of the day, we literally have a guy or gal with a stopwatch that is hand timing the ins and outs of every replay process.

We want to ensure that the sake of competitive equity, on a coach’s challenge in Los Angeles on a Thursday night primetime game is identical to a 1:00 game at Tennessee.

The horse racing corollary is obvious. When, how, and why stewards decide for or against disqualification should be identical at Del Mar as it is in Saratoga.

Again, questions that are always asked but often hard to put numbers to. This data revolution that has come for baseball, basketball, and soccer eventually came to football. If you can’t tell, and based on Kyle’s comments earlier, it’s coming for horse racing, too. That should excite you. The questions you always wondered, always asked as you make decisions, are ones now that you can answer perhaps more quickly.

What horses have the most stamina? What’s the best drafting style? What horses close the best? What is the optimal amount of rest in between races? What is the effect when you ship a horse east, west, or overseas? What are the optimal training timelines to avoid injury and maximize performance? Blinkers on or off? And I listen to a few too many horse racing podcasts, what does it actually mean when a horse gallops out?

While it’s an exciting time for sports analytics, not everything is fun and games. First, tracking data is, pardon the expression, a pain in the butt. That 150 rows of play-by-play data that football started with could fit in an Excel sheet across a couple seasons.

At three hundred rows for player tracking data, you might be okay with a laptop, but once you get to full seasons you would probably need a server or something like that. For folks that know Microsoft Excel, it’s no longer useful.

There are two popular and free coding softwares called R and Python, and if you’re interested in data science, you’ll know what those two are. Those are all but necessities at this point as folks try to analyze these hurdles of data.

On top of that software, you also need servers to store the data and engineers to help extract and transform it. You also need time. The typical workflow isn’t, go find me the wide receiver that has the fastest change of direction. It’s one, how do you quantify change of direction? Two, how do you write the code to do that on a single player on a single play? Once you have that you still need to figure out how to write the code for all players on all plays.

More data also means more problems. One of the most exciting aspects of tracking data is that whenever you try and answer a question, you’re very likely the very first person to ever do that exact question. There is no textbook. There is no right answer.

That is the scariest, and also makes it easy to screw up. Of course, it makes the insights all the more valuable.

I’ll close with my favorite use of analytical tools to grow the sport. Each fall our group runs a data science competition called the Big Data Bowl. You can see it pictured here. I believe we have the same club DJ operating our Big Data Bowl as the one doing today’s event.

As part of this competition we release data to the public, shared for free on a public website. This is data that the NFL could monetize, but they’re choosing not to. If you remember the Field of Dreams quote: If you build it, they will come. The sports data science corollary is obvious: If you share it, they will analyze it.

There is a good example from 2020 where our group was tasked with trying to predict the number of yards that a running back would gain on a running play. So, you can see here Leonard Fournette gets the ball against the Packers in the NFC title game.

The question is, given the amount available to Fournette, what would we expect him to gain on the play? I have a group of four, five Masters level data scientists and I have a PhD in statistics. We couldn’t figure this out.

That is, when we would look at a play we knew what was wrong. We would come up with a prediction at two yards and show it to a coach and be like, no, no, that’s not the right number. Or we would come up with 10 yards, no, that’s also not the right number. So, we weren’t able to come up with it ourselves.

The NFL actually employed two different vendors to try and solve that same question. That is, we knew this was an important metric that we would be able to put on air. We knew fans and coaches would be interested in it.

Like I said, if we share it, they will analyze it. As part of the 2020 competition that we had, we opened up this data, we shared the player tracking data, and we challenged data scientists around the world. You try and come up with a better algorithm than we did. They did. The winning algorithm for this competition was created by a pair of Austrian data scientists who literally had never watched football.

That is, a metric that you see on air that came from our competition was done and finalized and written into production by our team, but it was created by people who didn’t watch football and actually applied concepts from physics and chemistry into their algorithm.

In this past year’s competition, a husband-and-wife team co-authored a paper together. So, we’re bringing football fans together but we are also bringing marriages together, too. That’s even cooler when you found out that that couple had three young kids. As somebody who came to Saratoga this weekend with three young kids, I empathize.

This couple was also not football fans before. They finished as one of our eight finalists and they’re from Japan. Our participants and the folks interested in analyzing football or horse racing data aren’t just U.S. They’re all over the globe.

That means we’re helping grow our fan base simply by putting our data out there and asking interesting questions. This competition has grown into the pre-eminent hiring pipeline in sports. Roughly 40 participants have been hired by NFL teams or vendors from this competition, and roughly half of the NFL’s 32 teams have been hired.

So, on top of creating new metrics for fans, we’re growing the infrastructure available to our group, in this case our teams, our owners, for them to do their own analysis.

High school and college students can now go online to identify new football trends, make new charts, and follow their favorite athletes. The point that was made earlier about the connection with horses, this allows you to do that as well.

Thanks to Joe Applebaum and NYRA, horse racing went down this track last year with the Big Data Derby. An exciting sign of things to come.

I would certainly encourage more versions of similar competitions and more reach for the inevitable data influx that’s coming to horse racing. NFL football looks like it always has. The scouts are still grading stamina and change of direction. Coaches are still trying to come up with plays to scout their opponents on.

But our stats and understanding of the game and the use of data has never been higher and better. Horse racing will get there soon.

The thrill of Jace’s Road or Arcangelo or Sassy Little Lila crossing the finish line won’t change, but perhaps we’ll know a little bit more about how those horses got there.

Thank you.


Back Agenda Next