NFL Big Dat Bowl

The NFL has used AWS cloud computing technology for its Next Gen Stats platform since 2015—adding machine learning to Next Gen Stats in 2017—the advanced analytics platform that brings exciting new insights to the game of football. By tracking complex data coming from the field, machine learning models are deployed to enable new ways to visualize the action, including Completion Probability, Expected Yards After Catch, and Fastest Ball Carriers just to name a few. These advanced stats have been instrumental in enhancing the fan experience, bringing football fans more real-time data behind the talent of the players on the field and the decision-making during games.

Even with more than 200 AWS-powered Next Gen Stats in every play, there have only been a handful of advanced statistics to capture certain pieces of the running game. Until now. New this year, the NFL is introducing "Expected Rushing Yards" to help showcase a running back's abilities, including his speed and elusiveness. This season, the NFL will introduce several new Next Gen Stats using AWS technology, and this is just one of the advanced stats that fans will see. Expected Rushing Yards is designed to show how many rushing yards a ball-carrier is expected to gain on a given carry based on the relative location, speed, and direction of blockers and defenders.

This new stat is the result of a data analytics competition Big Data Bowl, powered by AWS. The Big Data Bowl, held at the end of February, is an opportunity for data scientists from around the world to come together and explore how to contribute to the continued evolution of the NFL's advanced statistics. This year's Big Data Bowl focused on the question: When an NFL ball carrier takes a handoff, how many yards should we expect him to gain on the play?

More than 2,000 people participated in the open-source contest and a two-person team from Austria – with no exposure to American football prior to the competition—came away as the clear victors. Philipp Singer and Dmitry Gordeev, who went by the team name "The Zoo," used their expertise in machine learning to build a convolutional neural network to develop the new statistic.

The two team members broke down how they built the model that eventually evolved into the new advanced stat in a discussion board following the Big Data Bowl:

"If we focus on the rusher and remove other offense team players, it looks like a simple game where one player tries to run away and 11 others try to catch him. We assume that as soon as the rushing play starts, every defender regardless of the position, will focus on stopping the rusher asap and every defender has a chance to do it. The chances of a defender to tackle the rusher (as well as estimated location of the tackle) depend on their relative location, speed and direction of movements."

The NFL's Next Gen Stats team then spent the off-season implementing The Zoo's modeling architecture using Amazon SageMaker—a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly—to create the new rushing metric. Now, with Expected Rushing Yards ready to roll out at the start of the season, teams, players, broadcasters, and fans can see more than the traditional stats surrounding rushing yards.

A number of primary metrics can now be derived from this new stat—Expected Rushing Yards, Rushing Yards Over Expected, Rushing Yards Over Expected per Attempt, Rush Percentage Over Expected, First Down Probability, and Touchdown Probability—all of which can be illustrated in one particular play from last season. The Cleveland Browns were leading the Baltimore Ravens, 24-18, with 9:47 left in the game and facing a 1st and 15 from their own 12-yard line. Running back Nick Chubb took a handoff and went 88 yards for the touchdown. This resulted in +81 Rush Yards Over Expected, a 12.4% First Down Probability, and a Touchdown Probability of less than 1%.

"Similar to how we’ve been able to better understand and appreciate quarterback and receiver performance with Completion Probability, we can now use these advanced stats to better quantify the unique abilities of running backs," said Matt Swensson, NFL VP of emerging products and technology. “The solution for Expected Rushing Yards is a unique approach that we can also leverage for future stats and potentially improve upon existing stats such as our Expected Yards After Catch metric. With AWS powering our Next Gen Stats platform, we continue to uncover new aspects of our game that we have never been able to see before."