Thursday, September 29, 2011

N.F.L. Game Probabilities Are Back, With One Adjustment

Game predictions are back for 2011. As always, the probability model is based on passing, running, turnovers and penalty efficiency. But now, running is represented by Success Rate (SR) rather than Yards Per Carry (YPC). SR is the percentage of runs in which the a team’s point expectancy (based on down, distance and yard line) has improved.

SR correlates far better than YPC with winning games. YPC is too susceptible to a handful of relatively rare breakaway runs. I think of running as a jab and passing as a cross or uppercut. The jab is a low-risk punch that doesn’t expose your defenses, keeps your opponent off balance and guessing, and keeps him from purely defending against your cross. A good jab is a prerequisite, but the cross is what scores points and wins bouts.

Running can be more than that, of course. It’s essential in short yardage and inside the red zone. And when team has a lead, it burns clock and helps keep the ball out of an opponent’s hands in the fourth quarter. I believe the revised model captures this aspect of the running game and better reflects the true inner workings of the sport.

For new readers, here is refresher on how the model works. A logistic regression is fed net Yards Per Attempt (YPA), run SR, and interception rates on both offense and defense, plus offensive fumble rate. Team penalty rates (penalty yards per play) and home field advantage are also included. These particular statistics are selected because they are predictive of future outcomes, not because they explain past wins. This is a distinction overlooked by nearly all experts. Sometimes less really is more when it comes to predictions.

For example, turnover rates explain past outcomes very well, but a relatively small part of turnover rates carry forward and predict future outcomes. If a team has a very low interception rate of 1.2 percent, how likely are they to continue the season with so few interceptions? Chances are they will remain better than average, but not nearly as low as 1.2 percent. This concept is known as regression to the mean, and it’s essential for good predictions.

“Logistic regression” might sound like just mathy mumbo-jumbo, but don’t let it scare you off. The regression uses data from recent N.F.L. seasons to tell us how each facet of team performance is best weighted to predict which team will win a game. Each team variable is regressed again to account for how reliable each particular facet is throughout a season. In other words, the stats vary in terms of how consistent they are from game to game. For example, offensive passing efficiency is most consistent, and turnover rates are least consistent.

Lastly, the model adjusts for each team’s previous opponent strength. This is an especially important consideration early in the season, when some teams have only played weaker teams, and some have had to struggle against solid opponents. As the season wears on, strength of schedule will tend to even out, but never completely.

The predictions sometimes challenge our preconceived intuitions about games. If you knew nothing else about N.F.L. teams except for how well they’ve played so far this season, this is how you would want to handicap the games. Of course, we do know more than what we’ve seen the past three weeks. Sometimes that’s helpful, but at least as often it clouds our minds with bias.

For instance, this week it says Oakland should be favored to beat New England. Do I personally believe the perennial laughing stock Raiders are really better than the perennial A.F.C. East champion Patriots? Maybe not. But the model is telling us something to pay attention to. It’s saying: “Look deeper at the numbers that really matter. The Raiders might just be for real. The Patriots’ defense is just as vulnerable as they say.” So if you’re in an office pick ’em league and have to pick an upset, maybe this is the week to put your chips on Oakland.

An explanation of the principles behind the model and a detailed example of how it is calculated can be found here.

And now here are the game probabilities for Week 4:

Brian Burke, a former Navy pilot who has taken up the less dangerous hobby of N.F.L. statistical analysis, operates Advanced NFL Stats, a blog about football, math and human behavior.

No comments:

Post a Comment

Comment

Comment