Wednesday, June 03, 2009

Today is the 20th anniversary of a 22-inning game between the Los Angeles Dodgers and Houston Astros (box score). Houston ultimately won, 5-4. From a streakiness perspective, the thing I've always remembered about this game is L.A.'s John Shelby going 0-for-10.

Looking at Shelby's career statistics, he clearly had a bad year with the bat in 1989, hitting only .183. (63-for-345). A .183 batting average translates into a .817 (i.e., 1 - . 183) failure rate on each at-bat. Raising .817 to the 10th power, for the probability of 10 successive failures (assuming independence of events), yields .133 as the likelihood of Shelby's going 0-for-10. We have not, of course, taken into account the quality of the opposing pitchers or any other factors, so this will have to be a rough estimate.

A 13% chance of Shelby going 0-for-10 is not astronomically small by any means. It's still fairly rare, however. The following figure shows Shelby's probabilities of getting 0, 1, 2, 3, etc., hits out of 10 at-bats. Not surprisingly, the likeliest scenarios were for him to get 1 or 2 hits. For the probability of 1 hit (and 9 failures), for example, we would take .817 to the 9th power, then multiply the result by .183, thus yielding .0297. There are 10 different ways to get exactly 1 hit out of 10 (i.e., in the first at-bat, or in the second, ... , or in the 10th), so we multiply .0297 X 10, yielding .297 (which is shown in the figure).


Shelby also had a higher probability of getting 3 hits in the game than 0 hits, but the chances start tailing off once we get to 4 hits. The above probabilities were obtained from the Vassar Binomial Calculator.

2 comments:

G Wolf said...

You mention he went 63-for-345 that year, but I think you need to take out his 0-for-10 performance before looking at his a priori batting average. In other words, without that 0-for-10 performance, looking at the rest of his season, what is his expected number of hits?

At least, I think that's how it should be done, right?

alan said...

Your larger point -- that how you define a player's base rate of success is a crucial part of calculating the probability of a given streak -- is well taken.

In the case of baseball hitting streaks (or slumps), should you use the player's season-to-date average, his full-season average, his career average, or something else?

There are at least two factors informing such decisions, in my view. One is the desire to capture a player's baseline ability around the time the streak (or slump) has started. Thus, if a player is at the twilight of a long career, a more recent average (e.g., past two years) would probably be more appropriate than career statistics.

Second, one wants the baseline average to be based on a large enough sample of at-bats to be meaningful. Thus, if a streak started one month into a season, I don't think you'd want to use the player's season-to-date batting average, as it would be based on relatively few at-bats.

Thanks for your thought-provoking question!