Win Probability for NFL Games Needs to Be Re-Assessed, Risks Being Marginalized


You have probably seen NFL win probability charts and estimates over the last few years. You usually only see them when something crazy happens, or when a team makes a big comeback. “They only had a 1% chance of winning!”

We saw it again during Super Bowl LI, when the New England Patriots were down 28-3 and came back to win. Using the win probabilities after each play, it hit 99.8% for the Falcons after the TD to go up 28-3. It went to 99.9% when Julian Edelman threw incomplete to Dion Lewis on 3rd and 3, still trailing by 25, before a slight rebound on the fourth down conversion. It went back to 99.9% after Gostkowski missed the extra point, the Falcons covered an onside kick, and immediately got to the New England 32. Even with Atlanta faltering, it stayed above 99% until Matt Ryan’s fumble with 8:31 remaining. It didn’t dip below 96% until James White caught the two-point conversion to make it an 8-point game.

Now, the Super Bowl was a truly rare game, an instant classic. The Falcons were indeed heavy favorites up by 25 late in the third quarter. But I fear the win probability estimates are in danger of being viewed like “The Boy Who Cried Wolf.” If you tell people that everything was near-impossible, they won’t take you seriously.

I say this as someone who believes in studying such things and thinks the concept is valuable. But it might be time to re-calibrate things.

I went through the 2016 season and pulled the win probability estimate with 10 minutes remaining in the fourth quarter, for every game involving at least one eventual playoff team, whether in the regular season or playoffs. That represents 173 games from this past season through the Super Bowl.

Here’s a summary of the results:

So yes, most games, according to win probability, were near-decided with 10 minutes left. 110 of the 173 had a win probability estimate of greater than 90% for the team in the lead. The team with the lead was expected to win 98.2% of those games; they actually won 92.7%.

You would have expected only two comebacks based on the win probability projections. There were three times that many.

In addition to the Super Bowl, the following games in the sample saw a big reversal, where one team was given less than a 10% chance with 10 minutes remaining:

  • Kansas City vs. San Diego, Week 1: Chiefs were given a 1.3% chance of winning with the ball at the San Diego 29, down 17, with 10 minutes left. They scored two plays later, won in overtime.
  • Oakland vs. New Orleans, Week 11: Raiders given only a 3.2% chance of winning as they trailed by 5, with the Saints having the ball inside the Oakland 10. They would hold them to a field goal and tie the game with a TD and two-point conversion less than two minutes later, and later win on the two-point attempt in regulation.
  • Houston vs. Indianapolis, Week 7: Texans given only a 0.7% chance of winning with Colts having ball at Texans 31, already up 11. It would drop to 0.2% after Vinatieri field goal made it a 14 point lead with 7:12 remaining.
  • Dallas vs. Philadelphia, Week 8: Cowboys given a 5.5% chance to win with the Eagles picking up a first down into Dallas territory, up 7. Eagles would eventually fall on a fumble for a big loss and punt it, and Dallas tied it on the next possession and won in OT.
  • Miami at Los Angeles, Week 11: Dolphins were given a 1.5% chance, as the Rams moved into Miami territory already up 10-0. Greg Zuerlein would miss a field goal and Dolphins would score two TDs to win in regulation.

Some of those seem a lot more likely than the estimates allowed. Dallas only having a 5.5% chance at home when they trailed by exactly 7, and the Eagles weren’t right by the goal line? The Raiders only down 5?

Win probability models are based on past events. The data can be thin at specific points so “similar” situations can be used, and then it’s a question of how similar. To get enough data, you also have to use multiple years. But, with the offensive and passing efficiency explosions of the last two years, it feels like the estimates aren’t caught up with the effect of those increases. For example, in the last two years, there have been 70 touchdowns scored in the last two minutes by a team trailing by one score. A decade ago (2005-2006 seasons), it was 40.

If you are relying on old data, and the conditions are changing, it will be too confident in winning. Scoring per drive is up. This, in turn, puts pressure on teams who can’t just run out the clock, hand it off three times, and punt. This, in turn, raises the chances of turnovers or errors slightly.

The results this year show that the win probability data is overconfident by about 5-6%. It is still *mostly* right but people notice it when the rare event happens, and it happens a little more frequently than win probability would suggest.