Sunday, August 14, 2011

Dan Uggla's hitting streak ended at 33 games this afternoon, as his Atlanta Braves fell to the Chicago Cubs, 6-5. Much was made of Uggla's low batting average prior to the streak and how unlikely it seemingly made the streak. In my view, judging the likelihood of Uggla's hitting streak is not so simple.

Let's start with a refresher on some principles of probability. Batting average represents a player's probability of gettting a hit in any given official at-bat. Where consecutive-game hitting streaks are concerned, we're interested in the probability of a player getting at least one hit in a game. The latter will generally be a higher probability than the batting average because the player usually will have multiple official at-bats in a game.

To estimate the probability of a player getting at least one hit in a game, statisticians typically assume a number of official at-bats per game for the player and further assume independence of outcomes (i.e., that what happened on one at-bat has no effect on a later at-bat). As of the conclusion of yesterday's play, Uggla was getting 3.76 official at-bats (AB) per game (448/119). Looking at Baseball Reference's wonderful game-by-game log for Uggla this season, he had a few games (mostly prior to the streak) with 0 or 1 plate appearances, suggesting he appeared as either a late-inning defensive replacement or pinch hitter in a few games. Assuming regular starts, which would be the case well into a hitting streak, we could estimate he'd have 4 AB per game.

Whereas batting average (BA) is the probability of a success (hit) in a particular official at-bat, the probability of failure in that at-bat, F = (1 - BA). The probability of an all-failure (no hits) game with 4 AB is simply F raised to the 4th power. Getting at least one hit means avoiding an all-failure game, so the probability of getting at least one hit is:  1 - (F^4). To know F, we need to know BA, and that is where the difficulty arises with Uggla.

The day Uggla began his hitting streak (July 5), he woke up with a .173 BA. During the streak, he hit .377 (49/130). Upon completion of his last game during the streak (i.e., yesterday's), his season-to-date average sat at .232. And, while we're at it, his lifetime BA (excluding 2011) is .263. The question is, which batting average should we use to best capture his batting ability, let's say, midway through the hitting streak? Another way to think of the problem is that, Uggla's hitless game today notwithstanding, we wanted to know what BA to use for him in predicting his chances of getting a hit in his next 23 games, to tie Joe DiMaggio's record of 56 games.

The following table runs through the steps of transforming an Uggla batting average into his estimated probability of getting at least one hit in his next 23 games.

p(Hit in 1 AB)
[Batting Avg]
p(Failure in 1 AB)p(Failure in
All 4 AB)
p(>/= 1 Hit
in 4 AB)
p(Hit in All of Next 23 Games)

Even under the most advantageous assumption for Uggla -- namely taking his batting average exclusively from his recent streak -- the chances of tying DiMaggio would be only about two percent. Still, which batting average should we use?

As shown in the book Scorecasting by Moskowitz and Wertheim, a baseball player's batting average over the past two seasons is a better predictor of success in the next at-bat than is batting average over the last five plate appearances, last five games, the last month, or season-to-date. Thus, going by the principle that large sample size trumps recency, Uggla's lifetime batting average would appear to be the best of the above options in predicting his future hitting streaks.

Another factor that helped Uggla in putting together the 33-game hitting streak was his low walk rate. At the close of yesterday's play, he had only 39 bases on balls, so that his number of official AB (448) was not that much lower than his total plate appearances (494). A tendency to draw a lot of walks can really short-circuit a hitting streak because a player may only get 1 or 2 official AB per game, thus giving him few opportunities to get a hit (if a player walks in all of his plate appearances in a game, however, a hitting streak continues). As Joe D’Aniello wrote about in the Baseball Research Journal (Vol. 32, 2003) in conjunction with his examination of DiMaggio’s hitting streak, a key reason why Ted Williams never contended for a long hitting streak was his propensity to draw walks. 

David Rockoff and Phil Yates, writing in the Journal of Quantitative Analysis in Sports, identified as a flaw in statistical formulations of hitting streaks the assumption of the same number of at-bats per game (as I did above in making calculations based on 4 AB per game for Uggla). In real life, as noted above, a player may get only 1 or 2 AB in some games, thus harming his chances to extend a hitting streak. In Uggla's case, however, his rate of walks (and other plate appearances not resulting in official at-bats) is so low as to largely avoid the problem stated by Rockoff and Yates, in my view.

No comments: