Friday, March 14, 2014

Team Scoring Runs in College Basketball -- Revisited

ESPN The Magazine's annual "Analytics Issue" (March 3, 2014) includes an article by Ken Pomeroy on when team scoring runs are most likely to occur in college basketball. Pomeroy focuses on runs of at least 10-0 (i.e., one team scoring 10 straight points without any scoring by the opponent), although other analysts might differ either on the minimum number of points by the "hot" team needed to constitute a run or on whether the shutout element is necessary for a run (i.e., some would consider outscoring a team by a margin of 15-2, for example, during a stretch to be a run).

Using the 10-0 criterion and voluminous data from recent seasons, Pomeroy examined, among other things, the probability of teams going on a run, depending on whether they were winning or losing (and by how much) or tied. He found small, but steady, differences, comprising an unmistakable trend. The more a team was behind, the higher its probability of going on a 10-0 run, and the more a team was ahead, the smaller its probability. A team trailing by 10 points had approximately a 1.86% chance of going on such a run, a team trailing by 9 points had roughly a 1.76% chance, a team 8 points behind had roughly a 1.72% chance, and so forth (the reason these percentages are approximate is that the exact values are not listed and I am estimating them visually from the heights of bars on a graph). In a tie game, a team has about a 1.24% chance of a 10-0 run. A team with a 1-point lead had around a 1.20% chance, one with a 2-point lead had roughly a 1.16% chance, and so forth. Finally, a team up 10 had approximately a 0.88% chance.

In my 2012 book Hot Hand, I also examined team runs, in this case in the 2004 NCAA men's basketball tournament. My aim at the time was simply to document the number of runs in the tourney, using the criterion of a 10-point margin, but not requiring a shutout during the run. A margin such as 12-2 or 16-3 would have sufficed, for example. As I wrote in the book, "Nearly three-quarters of the games (47 out of 64) featured at least one major run" (p. 27). Many of the games included multiple runs, so the number of total runs was 67. Unlike Pomeroy, I did not initially seek to correlate the occurrence of runs with whether the team that went on the run was ahead or behind (and by how much) at the time of the run. However, play-by-play sheets from the 2004 tournament are still available online (by going to a given team's schedule page at ESPN.com, as in this example, and then selecting 2003-04). Thus, I could go back and try to replicate Pomeroy's analysis.*

Whereas my horizontal axis was structured the same as Pomeroy's, depicting how many points behind (negative values) or ahead (positive values) a team was right before launching its run, I used a different measure of run intensity on the vertical axis. As I noted above, he looked at probability of a 10-0 run. Instead, I plotted the margin of a given run (e.g., outscoring an opponent 16-3 during a run would be recorded as +13). The graph appears below, the background ranging left-to-right from darker red for larger deficits to darker green for larger leads. You may click on the graph to enlarge it.


Each dot represents a particular scoring run. Three of the dots are annotated to provide examples of what all the dots represent. The downward-trending line, known as the best-fit line because it is close to as many of the dots as possible, shows the same pattern as Pomeroy's findings. The further behind a team was, the greater its tendency to outscore the opponent by a huge margin. For those of you with some statistical training, the correlation between initial deficit/lead and margin of outscoring the opponent during the run was r = -.24, on the cusp of statistical significance at p = .051.

Teams that were way ahead rarely went on a big scoring run. One exception, noted in the graph, is that Kansas, already leading 85-64, went on a 12-0 run against the University of Alabama-Birmingham to expand the Jayhawks' lead to 97-64. One reason teams with big leads rarely go on new scoring runs presumably is that they often take out their top players, both to rest them and to avoid the appearance of "running up the score" on the opponent.

A second, more statistically based reason for why trailing teams are more likely than leading teams to go on runs is the concept of regression toward the mean. Regression toward the mean tells us that, even absent any intervention, both extreme low performers (i.e., the trailing team) and extreme high performers (i.e., the leading team) tend to return toward more average performances. An extreme low performer has nowhere to go but up, and an extreme high performer has nowhere to go but down.

In conclusion, Pomeroy's and my investigations are very clear that trailing teams are far more likely to go on scoring runs than are leading teams. Psychological factors suggested by Pomeroy (e.g., motivation on the part of the trailing team and the desire to conserve energy by the leading team) and regression toward the mean are likely explanations of the basic finding, but it is difficult to know the relative importance of the two explanations.

---
*In revisiting the play-by-play sheets from the 2004 NCAA tournament, I noticed a few slight discrepancies with what appeared in my book. For example, a game article on the Mississippi State-Monmouth contest stated that, "The 15th-seeded [Monmouth] Hawks shot their way within four points late in the first half, but Mississippi State pulled away by controlling both ends of the floor. The Bulldogs tore off a 22-5 run in less than 10 minutes, and cruised to their largest margin of victory of the season." Based on the game article, I listed Mississippi State's run as 22-5 in my book. However, the article's qualifier "in less than 10 minutes" was more important than I realized at the time. If one looks at the play-by-play sheet, one sees that Mississippi State indeed outscored Monmouth 22-5 in the roughly 10:00 window of time from 4:47 left in the first half (MSU up 36-32) to 15:23 left in the second half (MSU up 58-37). However, what I did not notice until revisiting the play-by-play sheet for today's analysis is that Mississippi State added 6 more unanswered points beyond the 10-minute window, making the full run really 28-5.Small discrepancies such as this were corrected for today's analysis.

3 comments:

Anonymous said...

Your regression toward the mean argument does not seem right to me.

It would only be true if you knew the final score of the game and you knew that both teams, the ones that were behind and the ones that were ahead, ended up scoring the same number of points, on the average.

In fact, from a purely statistical perspective, the teams that are behind should have less of a chance of going on a run, not more, because they are worse teams on the average.

alan said...

Thanks for the comment. I agree that trailing teams should tend to be worse than leading teams. However, even superior teams can fall behind and when they mount a come-from-behind run, the case for regression toward the mean is presumably stronger. I looked over the 2004 seedings and several teams that made comeback runs were the higher-seeded team in the game:

St. Joe's (East 1) falls behind 12-21 to Texas Tech (E8), before going on a 24-2 run.

St. Joe's (E1) trails 8-17 to Wake Forest (E4), then goes on 13-1 run.

Wisconsin (E6) falls behind 29-42 to Richmond (E11), then goes on 34-8 run.

Kentucky (Midwest 1), down 46-56 to UAB (MW9), goes on 11-0 run.

Gonzaga (MW2) trails 3-10 to Valparaiso (MW15), takes off on 12-2 run.

Georgia Tech (MW3), down 36-43 to Nevada (MW10), unleashes 17-6 run.

Xavier (South 7) falls behind 39-53 to Louisville (S10), before going on 36-10 run.

Michissippi said...

Alan, I too am unsure about the the regression toward the mean argument. Let's say each team moves back toward their season averages in offensive production. If this tends to decrease the margin, that suggests that the leading team tends to be the luckier team, i.e., a team scoring above their season average. While this may be true, it seems it would be particularly true for games with small margins, and as the margin increased, the tendency for the leading team to be the luckier team would decrease since the worse team will require much more luck to take the lead. We actually see the opposite trend. This leads me to believe the psychological argument more, but obviously this whole topic could use more research. Thanks for the though provoking analysis!