Monday, January 15, 2007

This past Saturday, senior guard Lee Humphrey of the defending NCAA men's basketball champion Florida went 7-for-8 on three-point attempts in the Gators' 84-50 rout of conference rival South Carolina.

As I've done many times before, I want to conduct an analysis of the form, How likely is it that a player with a long-term prior success rate of X percent will proceed to make Y out of Z attempts in his or her next game? However, I want to go into a little more depth this time.

For Humphrey's prior probability of making threes, let's use .45. Looking at his career stats (which, of course, includes only part of the current season), we see that, with the exception of a .370 percentage from behind the arc his sophomore season, his other yearly percentages have clustered around .45 (.439, .459, and for this season so far, .452).

Humphrey's recent 7-of-8 performance from three-point land (.875) certainly exceeds .45. However, due to random sampling error, he is unlikely to hit at exactly a .45 clip in every game. The logic is the same as saying that, even though we know the probability of a tossed coin coming up heads is .50, repeated sets of ten tosses would likely yield something other than five heads and five tails for many of the sequences (sometimes more than five heads, sometimes fewer than five heads). The question then becomes, how incompatible is a 7-of-8 performance with an underlying .45 prior probability?

At this time, I usually bring in an online binomial calculator from Vassar College, and I do so again. By plugging in just three values -- number of attempts, n; number of stipulated successes, k; and probability of a success, p -- we can answer questions such as what's the probability of a prior .45 shooter making exactly 7 three-point attempts out of 8, and what's the probability of him or her making 7 or more out of 8?

(Statisticians would generally be more interested in the latter type of question -- probability of a particular value or more extreme -- than the former, as chances tend to be very low for any single, particular number of successes. For purposes of our analyses, however, we will need to look at probabilities of particular numbers of successes.)

By plugging in the full range of values from 0 to 8 for k, we can see the probabilities of making exactly 0, 1, 2, 3, etc., three-point shots, up to 8. These probabilities, which must sum to 1.0, are illustrated in the figure below.



As can be seen, with a .45 prior shooting percentage from behind the arc and eight shots taken, the most likely outcomes would be either three or four made shots. However, more or fewer made shots than that also have some non-ignorable probabilities.

At the extremes, these probabilities are fairly simple to compute, but get a bit more complicated in the middle of the distribution. A simple analogy would be to calculating the probability of double sixes on a roll of two dice by taking (1/6) squared, or 1/36 (all illustrations in this write-up assume independence of observations, as with dice, which has been shown to be a surprisingly reasonable assumption for sequential sports performances).

For a perfect 8-of-8 successes, the probability is simply (.45)^8, where ^ signifies raising to a power. Raising .45 to the eighth power yields .0017.

The basic probability of an exactly 7-of-8 sequence is computed according to...

(.45)^7 X (.55), which equals .0021 (.45 gets multiplied by itself seven times to represent the made shots, whereas the .55 represents the missed shot).

There are, however, eight different ways to make 7-of-8 shots. The one miss can occur on either the first shot, the second shot, etc., up through the eighth shot. We thus multiply .0021 X 8, yielding .0164.

The probability of making 7 or more out of 8 is thus .0017 + .0164 = .0181, or nearly 2 percent (1 in 50). If a .45 three-point shooter can play around 130 games over a four-year collegiate career, as Humphrey seems on pace to do, he or she might then be expected to have two or three games of making 7 or 8 threes in 8 attempts, purely on the basis of statistical fluctuation.

The probability of making exactly 6 out of 8 is (.45)^6 X (.55)^2, multiplied by the number of ways to make six shots. The number of ways gets pretty large in a hurry (i.e., missing shots 1 & 2, 1 & 3, etc., up through 1 & 8; missing shots 2 & 3, 2 & 4, etc., up through 2 & 8; and so forth). Similar reasoning applies for calculating the probability of making 5 of 8, 4 of 8, etc. See my Intro Stats lecture on this topic for further detail.

***

I also wanted to discuss, briefly, two other games from this past Saturday, one involving my undergraduate alma mater UCLA (vs. USC) and the other involving the university at which I'm on the faculty, Texas Tech (vs. Baylor).

In this year's first installment of the Battle of Los Angeles, USC got the ball with less than a minute remaining, trailing 63-57 (see play-by-play sheet). Under the most realistic scenario for the Trojans to tie the game, three things had to happen: they'd have to make a three, hold UCLA scoreless on its possession, then hit another three. Gabe Pruitt (whom we'll generously consider a .40 shooter from behind the arc, based mostly on previous seasons) and Nick Young (hitting about .45 from three-point land this season, but in the low .30s in previous years, so let's say .40 overall) did their part, hitting the two treys.

In between its two final possessions, USC fouled UCLA's Lorenzo Mata, a roughly .30 free-throw shooter this season, although a .50 and above shooter from the line in earlier seasons. Again for simplicity, let's assume a .40 FT% for Mata, which, conversely, is a .60 miss rate. There would thus be a .36 probability of Mata's missing both free throws. If you want to use .30 as his FT% and .70 as his miss rate, there would be a .49 probability of his missing both.

Mata indeed missed both free throws.

The probability of an 'SC three, Mata missing two from the stripe, and another 'SC three all happening in sequence would thus be .40 X .36 X .40 = .06 (or, if you prefer, .40 X .49 X .40 = .08).

There was one more "shoe to drop," however. Young was fouled on his three-point attempt and made the free throw for a rare four-point play, putting the Trojans up 64-63. I don't know the frequency of fouls on three-point attempts -- which would also have to be incorporated into the calculation -- but I would imagine it's pretty rare. Thus, unless we find out how often fouls on three-point attempts occur, we can say that the probability of USC taking the lead was incalculably small.

Ultimately, the Bruins still had some time on the clock after falling behind by a point, and Arron Afflalo hit a Michael Jordan-esque clutch shot from near the top of the key with four seconds remaining, to give UCLA the win, 65-64.

Finally, a surprising offensive force for Texas Tech in its 73-70 loss to Baylor was 6-8 forward Jon Plefka, who had not made any more than four field goals in a game previously this season. In the second half of the Baylor game, he made seven straight field goal attempts, some from outside including a three (box score and play-by-play document). Plefka will probably be receiving more playing time, so we can track any tendency of his for streak shooting.

2 comments:

Anonymous said...

Alan,

A student asked me a question that I don't have a ready answer to: How do you reconcile the absence of a genuine "hot hand" effect with the presumed effect of positive motivation/optimism in enhancing athletic performance?

The only possibilities that I can think of(other than denying at least one of the two effects)are granularity or non-linearity.

Any help would be appreciated.

Barry Anderson

alan said...

Here are a couple of possibilities I've thought of:

1. If a basketball shooter's optimism gets too out of control after making a few in a row, he or she might start taking shots further and further away from the basket, which will increase the chance of a miss and thus short-circuit the streak of made baskets.

2. In an event such as the NBA all-star three-point shooting contest, making a few in a row may increase the player's optimism, but because the player must attend to other tasks (grabbing the basketballs off the rack, moving to different places around the arc, etc.), perhaps any effect of optimism gets overshadowed.