Saturday, August 01, 2015

New Study of NBA 3-PT Contest Heats Up Hot-Hand Debates

A new study of NBA All-Star Weekend three-point shooting contests by Joshua Miller and Adam Sanjurjo, posted to the Social Science Research Network (link), has re-ignited debates over the magnitude of hot-hand effects on basketball shooting. Miller and Sanjurjo have identified a bias in certain types of hot-hand calculations that appears to have led to underestimation of hot-hand effects in previous studies. While there appears to be a broad consensus (including the present writer) on the validity of Miller and Sanjurjo's point, numerous other issues are being debated among the lead writers and commentators on various sports blogs.

First off, let's review the aforementioned bias. Miller and Sanjurjo, as have others, compared basketball shooters' hit rates when hot (in this case, following three straight made shots) to their hit rates following three-shot sequences other than three straight hits (when players are less hot or even cold). The authors' SSRN paper notes that distortion stems from the fact that "conditioning on a streak of three or more hits creates a selection bias in which these hits are removed from the sample, leaving a smaller fraction of hits, thus driving conditional performance on the subsequent shot below the base rate" (p. 9). Here's a concrete illustration. Using part of an example Miller shared in an e-mail, where H = hit and M = miss, the sequence [HHHMHHHM] would yield the not-so-hot result that the player was 0-for-2 on shots following three straight hits, even though the player's overall shooting (6-of-8) was very hot. Further, with a correction formula devised by Miller and Sanjuro, hot-hand effects now appear to be larger than previously thought (at least within this type of analysis).

This finding has sent statistically oriented bloggers to their keyboards with great urgency. Columbia University statistics professor Andrew Gelman headlined his July 9 posting "Hey -- guess what? There really is a hot hand!" The last I checked, Gelman's piece had received 105 comments! Then, on July 21, all-around sabermetrician Phil Birnbaum weighed in on his blog with a posting entitled "A 'hot hand' is found in the NBA three-point contest."

Though some of the commenters on these blogs have gone back and forth over the proper magnitude of the correction for the aforementioned bias and other methodological issues, I think it's easiest to take Miller and Sanjurjo's findings at face value. Table 1 of their paper is very informative, presenting results for 33 players, with and without bias-correction. The authors correctly note that, when averaging over players, those with negative results (shooting worse after a hot streak) can cancel out positive results. A simple look at the frequencies of different results therefore seems warranted, so I have summarized the results concisely from Miller and Sanjurjo's more-elaborate table. Even with the bias-correction (which enhances how streaky a player looks), here's how many players show different increases in shooting percentage conditional on making three straight shots:

Accuracy Gain 
After 3 Straight 
Hits ("Hotness")
No. of 
Contestants
.34
1
.20 to .22
3
.14 to .18
4
.11 to .12
5
.05 to .07
5
.01 to .04
10
Negative
5

Miller and Sanjurjo's claim that some players exhibit quite appreciable streakiness is well-supported. What about the "typical" or "average" performance? The median for all players (which is unaffected by how far in a positive or negative direction the most extreme values sit) is a .05 or 5% improvement after making three straight shots (16 players above .05, 2 at .05, and 15 below it). This is indeed stronger evidence for basketball-shooting streakiness than we've seen before. For example, a Harvard study (Bocskocsky, Ezekowitz & Stein, 2014) found approximately a 2% hot-hand increase for NBA in-game shooting, using SportVU tracking-camera  technology (here and here).

There are a couple of possible reasons why Miller and Sanjurjo's findings may overstate hot-hand effects. As Birnbaum notes, within the 25-shot sequence of the NBA three-point contest, there are five locations, from each of which the player attempts five straight shots. Thus, if a shooter hits his first shot from a given location, he can rely on the same motor/muscle memory in launching the next four shots.

Further, players invited to the NBA three-point-shooting contest are known to be great outside shooters, and players with high base rates of success appear more likely than those with lower base rates to go on hot streaks. Therefore, it would be interesting to see what would happen with a more representative cross-section of NBA players. (Both the motor/muscle-memory and base-rate issues are discussed in my book.)

Even if Miller and Sanjurjo's 5% median hot-hand effect is not inflated, it is still probably a smaller magnitude than most fans would associate with the term "hot hand," as the authors appear to acknowledge. The double-digit percentage-point increases some shooters exhibit after three straight hits, on the other hand, would seem to be closer to a lay characterization of a hot hand.

In addition, Miller, in comments on the Gelman blog, holds the Harvard study to a very high level of scrutiny, in my view. Arguably, too high. For example, Miller notes that it omitted some possible control variables, including "the quality and identity of the defender." However, the Harvard study did control for "Distance of Closest Defender, Angle of Closest Defender, Shooter-Defender Height Difference, and [whether the shooter was] Double Covered." Once all these facets of the defense are accounted for, I don't know how much incremental knowledge we gain from knowing the defensive-efficiency of the player guarding the shooter. I therefore take the Harvard study's 2% estimate of a hot-hand magnitude as having probative value.

In the end, I come down closer to Birnbaum's relatively skeptical view -- including his point that Miller and Sanjurjo's finding should be described as "a" hot hand, rather than "the" hot hand, because, like all studies, it is context-dependent -- than Gelman's more accepting position. Miller and Sanjurjo's hotness magnitudes for the hottest-shooting players are higher than I would have guessed. But the magnitudes for median shooters are only slightly higher than what I would have imagined.

Thursday, June 25, 2015

76er Statistician Harvey Pollack, Who Had a Hand in Hot-Hand Research, Dies at 93

As reported in newspapers yesterday, Philadelphia 76er statistician extraordinaire Harvey Pollack has died at age 93. In fact, it was Pollack's knack for quirky and exotic basketball statistics that greatly aided the first scholarly publication on hot-hand research.

The publication I'm referring to, of course, is Tom Gilovich, Robert Vallone, and Amos Tversky's 1985 article "The Hot Hand in Basketball: On the Misperception of Random Sequences," which was published in Cognitive Psychology (sometimes referred to as "GVT" in reference to the authors' last initials). Pollack is thanked in GVT's author notes, and here's why.

The GVT article included statistical analysis of three basketball-shooting compilations: field-goal shooting in 1980-81 Philadelphia 76er home games; free-throw shooting by the Boston Celtics in 1980-81 and 1981-82; and controlled shooting sessions with Cornell University men's and women's players.

Of crucial importance is that, except for Pollack and the 76ers, no NBA team kept sequential shooting data on field-goal attempts. Box scores tell us each player's number of total field-goal attempts and made attempts, but not the player's sequence (e.g., hit-miss-miss-hit-hit...). Only because Pollack kept sequential field-goal data for players was this aspect able to be feasibly incorporated into the GVT article.

Nowadays, play-by-play sheets are readily available on the Internet, which would make collecting player sequences reasonably doable. In the early 1980s, however, GVT's only alternative to Pollack's numbers would have been to watch large numbers of NBA games (not that they necessarily would have minded) and record sequential data on their own. As Gilovich confirmed for me yesterday, "Harvey was the guy without whom we could not have done our research. [I] owe him a lot!"

Sunday, April 19, 2015

Spurs' "Hotness" Entering NBA Playoffs

The San Antonio Spurs, owner of five NBA titles including last year's, were floundering for much of the current season, at least relative to their high standards. A four-game losing streak in late February put San Antonio at 34-23. Perhaps having the oldest roster in the NBA was starting to catch up with the Spurs. From that point on, however, Coach Gregg Popovich's crew went 21-4 to finish with a regular-season record of 55-27. And it wasn't just quantity of wins, but also quality, as the Spurs' hot streak included a March 22 win at Atlanta, an April 5 home win over Golden State, and a sweep of an April 8/10 home-and-home match-up vs. Houston.

Baseball-statistics maven Bill James has a statistic he calls "temperature" to assess how hot or cold an individual or team is at the moment. According to this article, the formula adds a standard value to a team's temperature for each win in a streak, regardless of the quality of opposition and other possible features of each win (e.g., home/away, margin of victory). James's temperature for individual baseball players' hotness puts greater weight on recent than distant performance, but it's not clear his team formulas do the same.

I started thinking about a temperature statistic for basketball teams, incorporating quality of opposition (with additional factors such as those listed above possibly being added later). The core concepts are that, against a tough opponent, a win should raise a team's temperature a lot, but a loss shouldn't hurt too much. Conversely, against a weak opponent, a loss should be damaging, but a win not very rewarding.

In my system, a team starts at the neutral point of 1.00. Then, after each game, the previous value is multiplied by an update factor. The multiplier after a win is (1 + opponent's winning percentage), so that the better the opponent, the larger the rise in temperature. The multiplier after a loss is just the opponent's winning percentage, which will drop the temperature (multiplying anything by a number greater than 1.00 increases value, whereas multiplying something by a number between 0.00-1.00 decreases value). The following graphic (on which you can click to enlarge) provides some examples.


The opponent's winning percentage (right before you've played them) appears on the horizontal axis, the red and blue lines are used after a win or loss, respectively, and the multiplier after a game appears on the vertical axis. As one example, suppose your opponent enters the game with a .750 winning percentage and you beat this opponent. The previous value of your "temperature" is then multiplied by 1.750; this is a bigger increase than if you beat a .600 team (which would result in a multiplier of 1.600). Conversely, losing to a .400 teams requires you to multiply your previous temperature by .400, cutting value by more than half (e.g., a previous value of 10 would become 4). Losing to a .800 team, in contrast, doesn't hurt as much (multiplying the previous value by .800).

In order for a win and a loss to cancel each other out and return a team to the neutral point of 1.00, a more dramatic win, such as beating a .750 team, would be offset by losing to a not-quite-as-good team, in this case with a pre-game .571 win percentage, and vice-versa (1.750 x .571 = 1.00, within rounding). The following graph provides a general characterization of the relationship between win and loss multipliers in order to restore a team to 1.00 (neutrality), plus another example.


Enough formulas, let's get to some basketball! First, we see the Spurs' hotness for the final 10 games of each of the past four regular seasons (I think you'll need to click on this chart!).


San Antonio's hotness over its last 10 games of the 2014-15 season is 28.47, obtained by multiplying the automatic start value of 1.00 x 1.685 (for the win over Memphis) x 1.466 (for the win over Miami), and so forth. The season-ending loss to New Orleans (which entered the game with a .543 winning percentage) essentially halved the Spurs' hotness value (i.e., multiplying by .543) in one fell swoop.

The fact that the Spurs' hotness was right around the neutral point of 1.00 in both 2013-14, when they won the NBA championship, and in 2012-13, when only a statistically unlikely comeback by Miami in Game 6 of the finals prevented a San Antonio title, suggests hotness over the final 10 games is not important.Similar findings have been obtained for baseball.

In the lockout-shortened 2011-12 season, however, the Spurs followed up their 10-game winning streak to end the regular season (hotness = 46.75) with 10 straight wins to begin the playoffs, before being eliminated. San Antonio didn't win the title in 2011-12, but a 20-game winning streak spanning the regular season and playoffs is pretty good!

Let's look at some other teams that were hot over their final 10 regular-season games in recent years.


As shown in the top row, the Spurs' opponent in the opening round of this year's playoffs (getting underway tonight), the Los Angeles Clippers, are pretty hot at the moment, too. Both teams are 9-1 over their final 10 regular-season games, but San Antonio (28.47) is hotter than L.A. (18.47), due to the Spurs' higher-quality opposition. For what it's worth, however, the Clippers' 18.47 hotness exceeds the 2012-13 NBA champion Miami Heat's 15.14 in also going 9-1 over its final 10 regular-season games (second row).

Looking at teams with 8-2 records over their final 10 regular-season games this year, the Golden State Warriors, who had far-and-away the NBA's best record (67-15), had a hotness value of 9.76 (third row), and the Boston Celtics, who needed a feverish run just to make the playoffs, had a hotness of 9.65 (last row).

As I noted above, other factors could be added to the mix. Perhaps a team's hotness could be multiplied by bonus adjustment factors of 1.05 or 1.10 (or something else) for each road win or blowout win, or could be multiplied by a deflationary factor of .95 or .90 for a home loss. Recency of performance, which I don't think was a big issue here due to the focus just on teams' final 10 games, could also be taken into account by multiplying newer wins by greater enhancement factors than older wins. Finally, teams' records toward the end of the regular season can be misleading due to resting of players. That's another factor for which adjustments would be helpful. Please share any ideas you have for further refinements, in the Comments section.

Wednesday, April 15, 2015

Korver Faces Tough Odds to Reach 50/50/90 Level

The Atlanta Hawks' Kyle Korver should be familiar to aficionados of hot shooting. The 6-foot-7 shooting guard once had a streak, spanning the 2012-13 and 2013-14 seasons, of making at least one three-pointer in a record 127 straight games (I analyzed Korver's streak here, when it was at 98 games).

During the 2014-15 season, Korver has sought out further frontiers of shooting accuracy. As Ian Levy pointed out back on February 13, Korver was threatening to record the unprecedented feat of hitting 50 percent on all shots from the field, 50 percent from three-point land, and 90 percent on free throws, a so-called 50/50/90 season.

As the Hawks enter their regular-season finale tonight at Chicago, Korver is slightly below all three milestone levels, with a .487 field-goal percentage, .493 three-point percentage, and .897 free-throw percentage (Korver stats page).

It's not even clear how much -- if at all -- Korver will play tonight, as the Hawks rested Korver and other key players last Sunday at Washington, although he played 34 minutes Monday vs. New York. However, assuming he plays tonight, what kind of shooting numbers will he need to post to reach each of the three criteria?

I plotted some equations for how many shots without a miss Korver would need to make to reach .500 on overall field goals and treys, and .900 on free throws. Even if Korver missed a shot of a given type, it would be mathematically possible for him to still reach the milestone, but far more makes and attempts would be necessary than if he never missed.

Let's take three-point shooting, where he enters the game 219 out of 444 (.493). Assuming no missed shots, the number of attempts is equal to the number of makes. We can thus define the equation:

y = (219 + x) / (444 + x)

where y represents Korver's three-point shooting percentage and x represents each new attempt (which is always made). In other words, each new attempt raises his number of attempts beyond the current 444 and each new make raises his number of makes beyond the current 219. By how many attempts (and makes) must x rise to bring y to .500? One can type an equation, such as the one above, into Google, which will automatically generate a plot. Here are the resulting plots for Korver in all three shooting categories (you may click on the graphic to enlarge it).


We see in the upper-right graph that Korver's three-point shooting line (blue upward trend) crosses the .500 threshold (black horizontal line) at six attempts. Six more made threes (again, without a miss) would give him 225, which would be half the new number of attempts, 450. Alternatively, Korver could hit the .500 threshold with a 7-of-8 performance behind the arc, resulting in (226/452). As I said, each miss progressively increases the number of shots he would need to make.  Making 6-of-6 on threes is not terribly likely. Given that his three-point percentage is very close to 50%, let's imagine coin-tossing. Korver would have to flip heads six times in a row, which has a probability of 1-in-64.

Finishing at .900 on free throws should be relatively easy. Korver just needs to make at least three free throws without a miss. If he's not perfect, he would have to make 12 of 13 to reach .900 (117/130).

Lastly, we have overall field-goal percentage. To reach .500, Korver would need a 16-for-16 night (resulting in 305/610) or, alternatively, 17-of-18 (306/612).

Clearly, Korver has his work cut out for him. At the college level, Christian Laettner's performance against Kentucky in the 1992 regional final comes to mind; not only did he hit the turnaround buzzer-beater, but he also hit 10-of-10 from the floor and 10-of-10 from the stripe. Also, Bill Walton hit on 21-of-22 field goals in the 1973 final. That's the kind of game Korver's looking at.

UPDATE: Korver went 3-of-6 from the floor at Chicago to finish the regular season with a .487 field-goal percentage; 2-of-5 on three-pointers for a final percentage of .492 beyond the arc; and 1-of-1 from the free-throw line to finish at .898 from the stripe (box score; final regular-season statistics).

Saturday, January 31, 2015

Serena Beats Sharapova 16th Straight Time, Wins Australian Open (with Updated Grand Slam/Age Chart)

Serena Williams defeated Maria Sharapova, 6-3, 7-6, in the Australian Open final, giving Williams her 19th Grand Slam singles title and 16th straight win over Sharapova in their head-to-head rivalry.

Back in 2013, on the eve of the French Open women's singles final (which Williams won, to increase her winning streak at the time to 31 matches), I created a chart of Grand Slam singles titles by age for several women's greats of the modern era: Serena and Venus Williams, Steffi Graf, Martina Navratilova, Chris Evert, Billie Jean King, and Margaret Court. The point I wanted to illustrate was that Williams appeared to be holding her own in Grand Slam tournaments beyond the age at which other greats began falling off.

Below, I have updated the chart. Williams's success in her thirties has continued, with her just-achieved Australian Open title being her third Grand Slam trophy beyond the 2013 French Open. You may click on the chart to enlarge it.



(I also corrected the ages for a few of Martina Navratilova's Grand Slam titles. Because the Australian Open has shifted in the past between November-December at the end of the year and January at the beginning of the year, seeing the year associated with the Australian Open is ambiguous at first glance. Examining the dates of each year's Australian Open during Navratilova's career, relative to her birthday, yielded some slight modifications as reflected above.)

Saturday, January 24, 2015

Greatest Shooting Quarter in NBA History: Klay Thompson

The Golden State Warriors' Klay Thompson last night had what I think can fairly be called the greatest single quarter of NBA shooting ever, as his team routed the visiting Sacramento Kings, 126-101. It's not just that Thompson set the Association record for most points in a quarter -- 37 in the third -- but how he did it.

He simply didn't miss. He hit all 9 three-pointers he tried, all 4 two-pointers, and both free throws. Based on this play-by-play sheet of the third quarter, I've graphed Thompson's shot sequence according to time left in the third quarter. You may click on the graphic to enlarge it.


As can be seen, Thompson didn't even begin his third-quarter scoring until 2:16 had elapsed (i.e., the 9:44 mark). Further, he scored 29 of the 37 points from the 6:03 mark in. Twenty-nine points in (roughly) six minutes would translate to 232 points in a 48-minute game! For the record, Thompson scored "only" 52 points on the night (including 11-of-15 from behind the arc). There were two short stretches (5:31, 4:56, and 4:18; 1:06 and 0:35) in which he was hitting a three approximately every 30 seconds.

Thompson entered the game with a three-point shooting percentage of .444 this season. The probability of a 9-for-9 quarter shooting threes would be .444 raised to the 9th power, which is .0007 or 7-in-10,000. However, we are asking this question after the fact, knowing the probability is likely to be very low. Using this binomial calculator, we can also examine the less extreme (although still post hoc) question of how likely a .444 shooter would be to hit 11 (or more) out of 15 three-point attempts in a game. The answer is .02 or 1-in-50.

Tuesday, January 13, 2015

Hot-Shooting Guards Lead NC St. Over Previously Unbeaten Duke

One of the biggest upsets of the current men's college basketball season occurred this past Sunday, as unranked North Carolina State handed No. 2 Duke its first loss, 87-75. Key to the Wolfpack's win was the three-point shooting of two upper-year guards, senior Ralston Turner and junior Trevor Lacey.

Turner hit three three-pointers in a little over two minutes (between 16:58-14:48) during the second half, to help NC State maintain a narrow lead. A bit later, Lacey hit a couple of threes and a two during a 22-5 run as the Wolfpack expanded a 50-48 lead to a 72-53 advantage (play-by-play sheet). Duke rallied some, but couldn't catch up.

Turner ended up 4-of-7 from behind the arc during the day, whereas Lacey went 5-of-7. As it turns out, these guards' long-range success has been building over the past month. In the following graph, I show Turner and Lacey's three-point shooting percentage game-by-game this season, with the larger basketball data-points (orange for Turner, red for Lacey) reflecting greater numbers of three-point attempts in a given game. You may click on the graph to enlarge it.


The graph should be read from bottom to top, in thin slices. In the Wolfpack's season-opener against Jackson State (JSU), for example, Turner shot 1-of-4 (.250) from behind the arc, whereas Lacey shot 3-of-5 (.600).

If you're an NC State fan, what you want to see are large-sized basketball icons high up in the graph, meaning that a player launches a lot of three-point attempts and makes a healthy share of them. Turner has indeed provided several large orange basketballs, shown in the shaded area representing shooting percentages between .400-.700. Against Tennessee, in fact, Turner made eight treys on an amazing 17 attempts, a .471 clip (click here for Turner's game-by-game log).

Lacey doesn't necessarily shoot many threes in a game -- indicated by the relative dearth of large red basketballs -- but when he fires from downtown, he frequently hits. In six of his last eight games, he has shot .500 or better from behind the arc (click here for Lacey's game-by-game log).

Two additional trends are worth noting. First, for the past month, cold-shooting games have been very rare for Turner and Lacey (see the blue notation on the graph). Second, the two players' shooting accuracy from game to game appears to be correlated. The good news is that, if one of them is shooting well in a game, the other tends to be, also. The bad news, however, is that if one is shooting poorly, so is the other likely to be. For those with some statistical training, the Pearson correlation between Turner and Lacey's three-point shooting percentage is .45 (where 1.00 is the maximum).

It's possible a sort of "contagion" operates between Turner and Lacey, where one player's shooting level in a game (good or bad) rubs off on the other. Another possible explanation is that good defensive teams shut down both Turner and Lacey, and bad ones let both of them shoot the deep ball well.