Saturday, March 29, 2014

Michigan's 3PT Shooting: An Illustration of Regression to the Mean

Despite holding a 60-45 lead over Tennessee with 10:57 left in last night's NCAA Sweet Sixteen game, the Michigan men's basketball team had to sweat things out for a 73-71 win (play-by-play sheet). One reason the Wolverines were unable to coast to a blow-out win over the Volunteers was a drop in Michigan's three-point shooting percentage from .778 (7-of-9) in the first half to .364 (4-of-11) in the second.

Whereas there could be substantive reasons for the Wolverines' second-half decline from behind the arc (e.g., fatigue, better Tennessee defense), the phenomenon of regression toward the mean almost certainly contributed, as well. Regression toward the mean refers to performers who exhibit extreme values on a set of initial measurements -- on either the high or low end -- achieving at closer to an average level on later measurements. According to the Social Research Methods website, regression toward the mean:

will happen anytime you measure two measures! It will happen forwards in time (i.e., from pretest to posttest). It will happen backwards in time (i.e., from posttest to pretest)! It will happen across measures collected at the same time (e.g., height and weight)! It will happen even if you don't give your program or treatment. 

Using box scores from all of Michigan's 2013-14 games to date (contained in UM's game notes in advance of Sunday's Elite Eight match-up with Kentucky), I plotted the Wolverines' team three-point shooting percentages for each first-half and second-half played this season. Each line in the graph links the two halves of the same game, with the Tennessee game depicted in orange, as one example (there were too many games, 36, to label each line). You may click on the graph to enlarge it.

Regression to the mean is indicated by lines that slope from very high to the middle, and lines that slope from very low to the middle. Also shown in the graph is Michigan's .402 three-point success rate for the season to this point. The Wolverines' pattern is a textbook example of regression toward the mean, as can be seen by comparing the above graph to this diagram from a textbook (Campbell and Kenny's A Primer on Regression Artifacts).

When Michigan (or any team) hits close to 80% of its treys in a half of one game, it is unlikely that it can match or exceed that rate in the other half. It is also true that a team shooting .100 or worse for a half will rarely* match or drop below that level in the other half.

As noted above, regression to the mean is virtually certain to occur anytime multiple measurements are obtained. The above depiction for Michigan is probably more dramatic than would be the case for most other teams, as most teams presumably are not as capable as the Wolverines of exceeding three-point shooting percentages of .600 or .700 within a half. Out of 351 NCAA Division I men's basketball teams, Michigan finished the regular season tied for seventh nationally in three-point shooting percentage.

*I inadvertently omitted the word "rarely" from the original version of this posting.

Thursday, March 27, 2014

Note to 76ers' Fans: Losing Streaks Usually End Against Bad Teams

With college basketball's March Madness dominating the U.S. hoops scene, it may have escaped some that the Philadelphia 76ers are on the verge of tying and possibly breaking the NBA record for longest losing streak. As shown on the Wikipedia's list of the longest NBA losing streaks, Philadelphia has been "deep-sixed" 25 straight times during its current streak, one loss shy of the record 26 consecutive defeats suffered by the 2010–11 Cleveland Cavaliers.

A record-tying 26th straight loss likely awaits the Sixers tonight at Houston. Even the disparity in the teams' records this season -- 48-22 for the Rockets, compared to 15-56 for Philly -- probably doesn't capture the full difference in the teams' abilities. After all, the Western Conference, in which Houston plays, has been much tougher this year than the Eastern Conference, in which the Sixers play, and teams play most of their games within conference. Also, after Philly recently traded Evan Turner, one of its better players, an article from contended that:

On paper this is a bad deal for the Sixers, but they have no intention on trying to win games. The team is tanking like no other in hopes of winning the 2014 NBA Draft Lottery.

Getting back to tonight's game, Carl Bialik, the former Wall Street Journal "Numbers Guy" who now writes for the newly relaunched FiveThirtyEight, calculates only a 4% chance of the 76ers winning.

As the Sixers' losing streak was building in recent weeks, I began trying to come up with a statistical angle on it. One line of thinking is that, contrary to the idea of other teams taking the struggling team lightly, opponents will play even harder against a team in free-fall in an attempt to avoid being "that team" -- the one against whom the losing streak ended. Thus, for Philly, ending its losing streak against a strong team such as Houston, on the road no less, would seem unlikely.

The question then came to me: what is the profile of an opposing squad against which a team ends its long losing streak? Presumably, such an opponent is likely to be a bad team. If you lose to a team that has lost its last 20 or 25 games, you can't be that good yourself. Ultimately, though, it's an empirical question.

I consulted the aforementioned Wikipedia list of the longest losing streaks in NBA history. The list included 30 losing streaks: one each of 26, 25, and 24 straight losses, three of length 23, one of 21 games, four of length 20, seven of length 19, four of length 18, and eight 17-game losing streaks. The list also included the date and opponent when the streak ended. I then went to Basketball Reference, which has extensive season logs for all teams in NBA history. For example, seeing on the Wikipedia list that the 2010-11 Cleveland Cavaliers (holder of the league record) ended their 26-game losing streak on February 11, 2011 against the L.A. Clippers, I could go to the Clippers' log for that season and see that they brought a 20-32 (.385) record into the game with the Cavs. Taking advantage of this weak opposition, Cleveland ended its losing streak.

I tried to make the same inquiry into the ending of all 30 of the NBA's longest losing streaks. However, streaks that carried over from one season to the next often ended early the next season, when teams may have played only a few games. To ensure relatively large samples of games, therefore, I limited my analysis to situations in which teams against whom a long losing streak ended had played at least 20 games during the season. There were 18 such situations, which I depict in the following graph. Unless you have some unbelievably strong eyesight, you'll want to click on the graphic to enlarge it.

The data points, represented by little basketballs, are arranged left-to-right from lowest to highest opponents' winning percentages entering games in which long losing streaks ended. A description of each streak-ending appears vertically by each ball. On the far left, the game in question is one in which the 1997-98 Denver Nuggets ended their 23-game losing streak by beating a 10-32 (.238) Clippers outfit. In another seven games, a team ended a long losing streak by beating a team whose winning percentage was in the .300's entering the game.

Contrary to my expectation that most games to end long losing streaks would have featured a really weak opposing team, seven of the games featured opponents with incoming winning percentages from .483-.614. And, most surprising of all, in three games, teams ended their long losing streaks against top-quality opposition (depicted in blue text on the graphic):
  • The 1964-65 then-San Francisco Warriors ended their 17-game losing streak by beating the 34-16 (.680) Cincinnati Royals (now the Sacramento Kings).
  • The 1972-73 Sixers, a squad that won only nine games all season, snapped their 20-game losing streak by beating the 42-18 (.700) Milwaukee Bucks. This was during the Kareem Abdul-Jabbar era in Milwaukee, in which the Bucks won the 1971 NBA title and lost a seven-game final in 1974. Kareem did miss the fourth quarter of the Sixers' streak-busting game, due to a back injury.
  • The 1967-68 then-San Diego Rockets ended their 17-game losing streak by beating the 48-16 (.750) 76ers. This was toward the end of Wilt Chamberlain's time in Philly, with the Sixers having won the 1967 NBA title.  
So yes, there is some precedent for teams ending their long losing streaks against opposing teams with winning percentages in the vicinity of .700. Perhaps you noticed another pattern, though. All three instances of teams ending their losing streaks against such lofty opposition occurred more than 40 years ago! It may be just a coincidence. However, another possibility is that the greater scrutiny of sports contests now than in the past (e.g., via the Internet, 24-hour sports cable networks, and radio talk shows) has made the top teams extra sensitive to becoming "that team" when they face an opponent on a long losing streak.

UPDATE: After losing to Houston to tie the NBA record of 26 straight losses, the 76ers beat Detroit to end the streak.

Friday, March 14, 2014

Team Scoring Runs in College Basketball -- Revisited

ESPN The Magazine's annual "Analytics Issue" (March 3, 2014) includes an article by Ken Pomeroy on when team scoring runs are most likely to occur in college basketball. Pomeroy focuses on runs of at least 10-0 (i.e., one team scoring 10 straight points without any scoring by the opponent), although other analysts might differ either on the minimum number of points by the "hot" team needed to constitute a run or on whether the shutout element is necessary for a run (i.e., some would consider outscoring a team by a margin of 15-2, for example, during a stretch to be a run).

Using the 10-0 criterion and voluminous data from recent seasons, Pomeroy examined, among other things, the probability of teams going on a run, depending on whether they were winning or losing (and by how much) or tied. He found small, but steady, differences, comprising an unmistakable trend. The more a team was behind, the higher its probability of going on a 10-0 run, and the more a team was ahead, the smaller its probability. A team trailing by 10 points had approximately a 1.86% chance of going on such a run, a team trailing by 9 points had roughly a 1.76% chance, a team 8 points behind had roughly a 1.72% chance, and so forth (the reason these percentages are approximate is that the exact values are not listed and I am estimating them visually from the heights of bars on a graph). In a tie game, a team has about a 1.24% chance of a 10-0 run. A team with a 1-point lead had around a 1.20% chance, one with a 2-point lead had roughly a 1.16% chance, and so forth. Finally, a team up 10 had approximately a 0.88% chance.

In my 2012 book Hot Hand, I also examined team runs, in this case in the 2004 NCAA men's basketball tournament. My aim at the time was simply to document the number of runs in the tourney, using the criterion of a 10-point margin, but not requiring a shutout during the run. A margin such as 12-2 or 16-3 would have sufficed, for example. As I wrote in the book, "Nearly three-quarters of the games (47 out of 64) featured at least one major run" (p. 27). Many of the games included multiple runs, so the number of total runs was 67. Unlike Pomeroy, I did not initially seek to correlate the occurrence of runs with whether the team that went on the run was ahead or behind (and by how much) at the time of the run. However, play-by-play sheets from the 2004 tournament are still available online (by going to a given team's schedule page at, as in this example, and then selecting 2003-04). Thus, I could go back and try to replicate Pomeroy's analysis.*

Whereas my horizontal axis was structured the same as Pomeroy's, depicting how many points behind (negative values) or ahead (positive values) a team was right before launching its run, I used a different measure of run intensity on the vertical axis. As I noted above, he looked at probability of a 10-0 run. Instead, I plotted the margin of a given run (e.g., outscoring an opponent 16-3 during a run would be recorded as +13). The graph appears below, the background ranging left-to-right from darker red for larger deficits to darker green for larger leads. You may click on the graph to enlarge it.

Each dot represents a particular scoring run. Three of the dots are annotated to provide examples of what all the dots represent. The downward-trending line, known as the best-fit line because it is close to as many of the dots as possible, shows the same pattern as Pomeroy's findings. The further behind a team was, the greater its tendency to outscore the opponent by a huge margin. For those of you with some statistical training, the correlation between initial deficit/lead and margin of outscoring the opponent during the run was r = -.24, on the cusp of statistical significance at p = .051.

Teams that were way ahead rarely went on a big scoring run. One exception, noted in the graph, is that Kansas, already leading 85-64, went on a 12-0 run against the University of Alabama-Birmingham to expand the Jayhawks' lead to 97-64. One reason teams with big leads rarely go on new scoring runs presumably is that they often take out their top players, both to rest them and to avoid the appearance of "running up the score" on the opponent.

A second, more statistically based reason for why trailing teams are more likely than leading teams to go on runs is the concept of regression toward the mean. Regression toward the mean tells us that, even absent any intervention, both extreme low performers (i.e., the trailing team) and extreme high performers (i.e., the leading team) tend to return toward more average performances. An extreme low performer has nowhere to go but up, and an extreme high performer has nowhere to go but down.

In conclusion, Pomeroy's and my investigations are very clear that trailing teams are far more likely to go on scoring runs than are leading teams. Psychological factors suggested by Pomeroy (e.g., motivation on the part of the trailing team and the desire to conserve energy by the leading team) and regression toward the mean are likely explanations of the basic finding, but it is difficult to know the relative importance of the two explanations.

*In revisiting the play-by-play sheets from the 2004 NCAA tournament, I noticed a few slight discrepancies with what appeared in my book. For example, a game article on the Mississippi State-Monmouth contest stated that, "The 15th-seeded [Monmouth] Hawks shot their way within four points late in the first half, but Mississippi State pulled away by controlling both ends of the floor. The Bulldogs tore off a 22-5 run in less than 10 minutes, and cruised to their largest margin of victory of the season." Based on the game article, I listed Mississippi State's run as 22-5 in my book. However, the article's qualifier "in less than 10 minutes" was more important than I realized at the time. If one looks at the play-by-play sheet, one sees that Mississippi State indeed outscored Monmouth 22-5 in the roughly 10:00 window of time from 4:47 left in the first half (MSU up 36-32) to 15:23 left in the second half (MSU up 58-37). However, what I did not notice until revisiting the play-by-play sheet for today's analysis is that Mississippi State added 6 more unanswered points beyond the 10-minute window, making the full run really 28-5.Small discrepancies such as this were corrected for today's analysis.