Sunday, December 30, 2007

NFL 2007: Pats & Parity

The regular season of the National Football League is now over and, far and away, the biggest story of the year is the New England Patriots' perfect 16-0 record.

As nearly all football fans would know, the last perfect regular season was the 1972 Miami Dolphins' 14-0 campaign; the Dolphins also swept their three post-season games to finish at 17-0. Two Chicago Bear teams from much earlier in NFL history, 1934 and 1942, also won all their regular-season games, but neither squad won the league title.

Before trying to figure out, post hoc, what the probability was of New England completing a 16-0 regular season, however, a brief look at the overall structure of wins and losses in the NFL is in order. If we were simply to flip coins to determine each team's record (i.e., heads = a win; tails = a loss), an infinite number of simulated 16-toss "seasons" would yield the following distribution.


As seen in the next graph, there were a few really good teams (the Colts, Cowboys, and Packers, each at 13-3, were just a cut below the Patriots), one dreadful team (the 1-15 Dolphins), and a lot of mediocre teams clustered around a record of 8-8, during this past season (click here for the final standings).


The actual collection of NFL team records this past season was thus fairly close -- disturbingly close in many fans' minds -- to what would be expected from just flipping coins!

OK, so what was the probability of the Patriots' perfect 16-0 regular season?

Using an approach I developed for my 35th anniversary retrospective on the Los Angeles Lakers' 33-game basketball winning streak during the 1971-72 season, I first attempted to estimate the Pats' chance of winning each of their games this past season, individually, based on the difficulty of the opposition in a particular contest and whether the game was played at home or on the road.

After looking at the final win-loss records of New England's 13 unique opponents (the Pats played their three intra-divisional rivals twice), I grouped the opponents into five levels of difficulty (click here for New England's schedule):

A (hardest opponents) -- The Patriots faced the aforementioned Colts and Cowboys, each of whom won 13 games.

B -- Teams that won 10 or 11 games, comprising San Diego, Cleveland, Pittsburgh, and the New York Giants, formed the second-toughest tier of opponents.

C -- Teams that won from 7 to 9 games, comprising Buffalo, Cincinnati, Washington, and Philadelphia, were deemed to be "mediocre" opponents.

D -- Teams that won 4 or 5 games, specifically the New York Jets and Baltimore, were considered "weak" opponents. The Ravens actually gave the Patriots one of their biggest scares, but that's neither here nor there, given my system of basing opponents' strength on objective win-loss records.

E -- The aforementioned Dolphins were in a class by themselves, providing the Patriots with their easiest opposition.

For each combination of opponent strength and home/away status, I came up with the following (assumed) probabilities of the Patriots' winning any given game (the guidelines below are similar, but not identical, to those I developed for the 1971-72 Lakers). To avoid any confusion, "home" refers to a game at New England.

E opponent at home ---> .95
E opponent on the road ---> .90
D opponent at home ---> .85
D opponent on the road ---> .80
C opponent at home ---> .75
C on road or B at home ---> .70
B opponent on road ---> .65
A opponent at home ---> .60
A opponent on road ---> .55

The 16 individual game-specific Patriot-win probabilities were then multiplied together, according to what is known as the "multiplication rule." Multiplying these 16 probabilities together yielded .006; in other words, the chances of a perfect season like the Patriots' would be 6 out of 1,000 or roughly 1 in 167.

Again, this is just an estimate, based on some convenient assumptions, such as the outcomes of adjacent games being independent (i.e., winning one game does not affect the probability of winning the next game, beyond strength of opposition and game location). Were the distribution of win-loss records throughout the NFL in some future season to be substantially different from this year's (as graphed above), that would also affect the estimated probability, via the strength of opposition.

In any given season, just at a theoretical level, one or two teams (at most) might be expected to contend for 16-0. One must take into account the head-to-head aspect (especially within a conference); having teams play each other directly rules out the possibility of both going 16-0. To take the match-up in Super Bowl XIX (after the 1984 regular season) as an example, the NFC champion San Francisco 49ers had gone 15-1 in the regular season and the AFC champion Miami Dolphins had gone 14-2, so having a pair of teams threaten to go 16-0 in the same season is not totally farfetched.

Thus, if we held out the possibility of two teams per year possibly being able to go 16-0, then we would expect one team roughly every 85 years actually to do so (2 contenders per year X 85 years = 170, similar to the 1-in-167 figure I came up with, above). The 2007 NFL regular season was the 30th played under a 16-game format.

Next, we move on to see if New England can make it to -- and win -- the Super Bowl, and thus finish an unprecedented 19-0...

Saturday, December 29, 2007

The L.A. Lakers reeled off a 16-0 run in the first quarter of their game Friday night against the visiting Utah Jazz and never looked back, winning 123-109 (play-by-play sheet). The spurt turned a 13-12 Laker deficit into a 28-13 lead.

Kobe Bryant hit a pair of three-pointers during the run and finished 6-of-9 from behind the arc, for the game as a whole (box score).

Friday, December 28, 2007

Whatever it is that makes some teams get off to fast starts and others, to slow starts, was on display Thursday night at the Staples Center in Los Angeles, as the Phoenix Suns blew out the Clippers.

The Suns dominated the opening minutes of both the first half -- in which they jumped out to leads of 8-0 and 13-2 -- and second half -- in which they outscored the Clippers 18-1. ESPN.com's play-by-play sheet is available here.

Saturday, December 22, 2007

The NBA's Portland Trail Blazers, whose rebuilding project suffered a setback with star draft pick Greg Oden having to miss the season due to injury, are now playing at a level few would have expected. Friday night, Portland won its 10th game in a row, outlasting the Denver Nuggets, 99-96. After a 5-12 start, the Blazers are now 15-12.

In keeping the winning streak alive, Portland was aided by a hot stretch within the game. Trailing 76-69 after three quarters, the Blazers went on a 16-2 run to start the fourth (play-by-play sheet for the final period).

Thursday, December 20, 2007

There were some substantial team runs in a couple of high-profile basketball games tonight.

In the match-up of NBA superstars Kobe Bryant and Lebron James, the Cleveland Cavaliers used a 16-0 run, spanning the third and fourth quarters, to turn around their game against the L.A. Lakers (play-by-play sheet). The Cavs turned a 78-67 deficit into an 83-78 lead and ultimately won the game, 94-90.

And, in a battle of two undefeated, top 10, men's college teams, Pittsburgh unleashed a 12-0 run to get back into its game against Duke (game article). The game ended up going into overtime, where the Panthers were victorious, 65-64.

Tuesday, December 18, 2007

The New York Knicks and Coach/President of Basketball Operations Isiah Thomas, who've had their difficulties on and off the court, added to their woes Monday night, in a 119-92 loss to the Indiana Pacers. En route to the loss, the Knicks missed 20 straight field goal attempts in the second quarter. As seen in this play-by-play sheet for the period, many of the misses were from short distance, including layups and tip-in attempts.

Saturday, December 15, 2007

A couple of basketball items:

Friday night, with 3:29 left in the games, the visiting Los Angeles Lakers led the Golden State Warriors, 102-94. Golden State then outscored the Lakers 14-2 over roughly the next three minutes, culminating with Baron Davis's three-pointer on a hastily released shot with 0:16 remaining. That shot put the Warriors up 108-104, and they ultimately prevailed, 108-106 (fourth quarter play-by-play).

As discussed in recent postings, team scoring runs such as that pulled off by Golden State do occur fairly frequently. However, I don't recall seeing many that turned a game upside down in the last few minutes, like the one Friday. The win also ended a Warriors' nine-game losing streak to the Lakers.

Switching to the college level, I have noted the Texas Tech men's tendencies this season to shoot well from behind the three-point arc and/or allow their opponents to do the same. Butler, for example, hit 16-24 (.667) against the Red Raiders in the final of the Great Alaska Shootout.

Well, Saturday afternoon, Texas Tech was torched again from three-point land, as New Mexico hit a mindboggling 9 of 11 (.818) treys in an 80-63 Lobo win. Quoting from New Mexico's press release prior to the Texas Tech game:

Before the 3-for-24 performance from long range against Southern Utah, the Lobos were among the nation’s early leaders in 3-point shooting. However, they still lead the league in 3s per game at 9.1 and in accuracy at 42.3%.

The 3s have dropped somewhat, though. Through the first six games, UNM was averaging 10.8 a game and shooting 47.8%, however, the past 4 contests, the Lobos are making an average of 6.5 treys and shooting 32.9% (26-79).


As I've discussed previously, a truly streaky team (or individual) will exhibit pronounced hot and cold spells over a season. It will be interesting to revisit this New Mexico squad at the end of the season to conduct systematic analyses from a larger sample of games.

Saturday, December 08, 2007

Now that we've been talking about team runs within basketball games, here are a few more from today's men's action...

The opening 12 minutes of the Arizona-Illinois game featured not so much the trading of baskets, as the trading of runs (ESPN.com play-by-play sheet).

The Illini jumped out to a 12-0 lead, only to see an 11-0 Wildcat run tie the game 16-16. Illinois then scored nine straight, to go up 25-16. Eventually, things settled down with Arizona fighting back via smaller runs. Ultimately, the Wildcats prevailed in overtime, 78-72.

The Purdue-Missouri game featured a big run by each team. As described on the BoilerStation blog:

The Tigers couldn't miss during the first 20 minutes and appeared on the verge of running away with this one, leading 44-36. Back came the Boilermakers, using a 16-3 run to take a 60-51 lead with 7:20 remaining.

Then it was Missouri's turn again. Purdue went ice cold from the field, and the Tigers outscored the Boilermakers 22-3 during the final 7:11.


The second-half play-by-play sheet is available here.

Out west, the UCLA men used runs of 19-6 and 8-0 in overcoming an early 18-point deficit and pulling away from Davidson.

Wednesday, December 05, 2007

[Update, December 6, 2007: I've now located a complete play-by-play sheet of the Louisiana Tech-Texas Tech game, which allows me to make one little correction from what I stated from memory last night.]

Funny that my previous posting was on team runs in men's college basketball. Tonight's Texas Tech home game against Louisiana Tech was broadcast locally, and I was periodically checking in to see how the game was going. Early on, it was relatively even, with Texas Tech leading 12-10, as I recall. When I checked back a little while later, Texas Tech had pulled out to a 30-12 lead (corrected from last night), and then a while after that, the Red Raiders' lead had grown to 50-13.

According to this wire-service story I found at ESPN.com:

Texas Tech (6-3) outscored the Bulldogs 46-3 [!!!] from midway through the first half to midway through the second half, going up by as many as 45 during the run.

Louisiana Tech (1-5) did not hit a single field goal for more than 19 minutes while turning the ball over 19 times.


[My emphasis added.]

For Louisiana Tech, which ultimately lost 86-31, the box score revealed these lowlights: 13-59 (.220) on field-goal attempts overall; 3-19 (.158) on three-pointers; 10-40 (.250) on two-pointers; and 2-6 (.333) on free throws.

While the game was still going on, after it had become evident that serious statistical analysis would be warranted, I went to Louisiana Tech's statistics page, so I could have the team's percentages entering the Texas Tech game. Such an a priori baseline provides a standard of comparison for how awry the Bulldogs' offense went against Texas Tech. Louisiana Tech's prior statistics were as follows:

Its field-goal shooting was 97-267 (.363) overall; 23-82 (.280) on three-point attempts; and therefore, 74-185 (.400) on two-point attempts. Even with these low prior shooting percentages, Louisiana Tech managed to shoot considerably below them against Texas Tech (since LT had so few free-throw attempts in Lubbock, I didn't bother with the prior percentage). The following figure conveys the above statistics in graphical form.



On Texas Tech's side of the ledger, the Raiders had an overall FG percentage of .545 (36-66) against LT, which is good, but not anything to make fans forget Villanova's shooting in the 1985 national championship game. Texas Tech's 3PT% of .333 (3-9), and FT% of .579 (11-19) were hardly spectacular, either.

There's a lot more that can be analyzed regarding Louisiana Tech's woeful outing in Lubbock, but given the lateness of the hour, that will have to wait... It's Thursday afternoon and I'm now back with more analyses, below...

Now that the play-by-play sheet is available, we can break down what happened to Louisiana Tech during the stretch in which it was outscored 46-3:

Of its 19 two-point attempts, the team made 1 basket, but missed 12 jumpers, 5 layups, and 1 dunk.

It went 0-9 on three-pointers.

It went 1-2 on free-throw attempts.


One way in which a cold streak can be self-perpetuating is that, as a team falls further and further behind, it starts jacking up three-point attempts in a feverish attempt to make a comeback. The Bulldogs exhibited some degree of this tendency, as in a sequence of six shots from right before to right after halftime, five were from behind the arc.

In closing, I want to go back to Louisiana Tech's overall shooting for the game. Using an online calculator, we can determine that under an independence assumption (i.e., one outcome having no bearing on the next, like coin-flipping)...

For a team coming in hitting .400 on two-point attempts (which LT was) to go 10-40 (or worse) in the Texas Tech game has a probability of .035.

And for a team coming in hitting .280 on three-pointers to go 3-19 (or worse) has a probability of .178.

Using the conventions of statistical testing, the Bulldogs were only significantly worse in their two-point field-goal shooting in the Texas Tech game than what their prior baseline was.

Monday, December 03, 2007

Team runs in basketball are when one team outscores the other by a substantial margin (say 10 or more points) within a relatively short timeframe, often (but not necessarily) shutting out the opponent in the process.

I kept a close eye out for notable team runs in the 2004 men's NCAA basketball tournament, and found at least one of them to occur in 75% of the games (47 out of the 63 games, excluding the play-in game). Examples included:

Texas Tech going on a 14-0 run to take control of its first-round game against Charlotte, but then getting eliminated in the second round by St. Joseph's, who reeled off its own 24-2 run against the Red Raiders.

Kansas and Pacific trading runs in their second-round contest, KU's 15-2 first-half spurt being countered by a 12-0 UOP run. Ultimately, a Jayhawks' 15-3 second-half run proved decisive.

UConn unleashing a 12-0 run down the stretch to overtake Duke in a memorable national semifinal contest.

(A full list is available upon request, by e-mailing me through the link to my faculty webpage.)

Though team runs appear to be fairly common, it is quite another matter for the No. 1 team in the country, playing at home, to suffer one. But that's exactly what happened to the top-ranked UCLA men yesterday against No. 8 Texas.

According to this game story:

"Texas outscored the Bruins 26-2, including 17 in a row, in the first half" to take a sizable lead. But, the Bruins came back as, "UCLA opened the second half on a 16-3 run that produced its first lead since early in the game." The game eventually reached an equilibrium, with the Longhorns pulling it out at the end.

I have not attempted to document the rate of team runs in the regular season, but even if they don't occur as frequently as in the NCAA tournament, they still probably occur often enough. Thus, whether your favorite team is leading or trailing in a game, as Yogi Berra said, "It ain't over, till it's over."

Sunday, November 25, 2007

Last night marked the conclusion of the Great Alaska Shootout, one of the many men's collegiate basketball tournaments taking place over the Thanksgiving weekend. The word Shootout was particularly apt, as the two finalists, champion Butler and runner-up Texas Tech, lit things up from behind the three-point arc all tournament long.

Here is Butler's report card on three-pointers:

Quarter-final vs. Michigan: 17-32, .531, with the top individual performances coming from Pete Campbell (6 of 11) and A.J. Graves (5 of 10).

Semi-final vs. Virginia Tech: 14-33, .424, including Campbell at 7-13 (Graves was a little off at 4-13).

Final vs. Texas Tech: 16-24, .667, with Campbell 4-7, Graves 6-8, and Mike Green 4-4.

Overall, the Bulldogs were thus 47-89 (.528) in the Alaska Shootout.

On Texas Tech's side:

Quarter-final vs. Alaska-Anchorage: 7-13, .538, largely based on John Roberson's 6 of 7.

Semi-final vs. Gonzaga: 8-18, .444, led by Alan Voskuil's 5-7.

Final vs. Butler: 3-PT FGS: 6-12, .500, with Voskuil going a perfect 4-4.

Overall, the Red Raiders were thus 21-43 (.488).

Historically, I don't know how often two teams in the same tournament (of at least three rounds) have maintained three-point shooting percentages this high, but I doubt it's very often!

Hot three-point shooting to start the season is nothing new for the Red Raiders. Almost exactly one year ago, the Texas Tech men were leading the nation in three-point shooting percentage (.504). As I discussed back then, the team's three-point percentage was likely to go down over the course of the season, purely as a matter of two statistical principles, regression to the mean and the small sample size for early-season statistics.

Indeed, the Red Raiders ended the year with a .412 three-point percentage, for eighth in the nation (because other fast-starting teams were also susceptible to the same statistical considerations, I did not predict that Tech would necessarily fall out of first place, just that the team's absolute percentage would go down).

As of this moment, according to ESPN.com's statistics page, Butler is fourth in the nation in team three-point shooting, making 77-164, for a .470 percentage.

Last year's three-point leader was Northern Arizona at .426 and, as I documented in a posting last year, from 2002-2006 no team led the nation in three-point shooting with anything higher than a .440 percentage. Thus, we'll see if Butler can exceed that level for a full season.

Texas Tech currently sits tied for 53rd in the nation from behind the arc (35-86, 40.7), due to some poor pre-Alaska shooting. From the Red Raiders' perspective, we'll see if their hot shooting in Alaska was a temporary blip or part of an incipient trend...

Saturday, November 24, 2007

The story coming out of the University of Nebraska volleyball building tonight, quoting from this news release, is that:

Tracy Stalls tied an NCAA record by putting down 13 kills on 13 swings for a perfect 1.000 attack percentage, as the Husker volleyball team sent Stalls and NU's three other seniors out in style Saturday night with a 30-18, 30-10, 30-11 sweep of Texas Tech.

For those not all that familiar with volleyball statistics, what this means is that 13 times the ball was set up for Stalls to swing at, and all 13 times she delivered balls that the Red Raiders could not field. Stalls hit no balls into the net, nor out of bounds, had no balls blocked back in her face by Texas Tech, and did not even have any balls dug up by the Red Raiders in the backcourt.

Now, that's a hot hand!

[Cross-posted at my VolleyMetrics blog.]

Thursday, November 22, 2007

I hope everyone is having a great Thanksgiving! One person who definitely has had one is Green Bay Packers quarterback Brett Favre. The 38-year-old signal caller completed 20 consecutive passes, a team record and only two away from the NFL record, as the Packers knocked off the Detroit Lions.

Looking at the play-by-play sheet (the streak starts right after the two-minute warning before halftime), one can see that most of the passes were for short yardage, but even short passes can go awry.

The yardage gains, in the order the plays occurred, are as follows:

10, 13, 20*, 9, 7, 7, 10 (sets up field goal)

5, 7, 9, 4, 8, 43*, 4 (last one for touchdown)

24*, 0 [complete but no gain], [incompletion negated by defensive pass interference], 7, 3 (last one for touchdown)

2, 41^

*Play-by-play sheet lists pass as "deep," suggesting most of gain was through the air.

^Play-by-play sheet lists pass as "short," suggesting most of gain was via run after the catch.

Tuesday, November 20, 2007

I have been fortunate in recent months to have a couple of sports journalists take an interest in my “hot hand” research and in applying some fairly subtle statistical concepts to sports, more generally.

One of these journalists, Jerry Crasnick, contacted me in August about an article he was writing for one of the Major League Baseball post-season souvenir programs (which turned out to be for the World Series) and interviewed me for his piece on “Baseball’s Law of Averages.”

The other writer, Kenneth Shouler, contacted me back in the spring about an article he was writing for Cigar Aficionado magazine on NBA basketball players being “in the zone” when they make several shots in a row. This article is now on the newsstands in the December issue of Cigar Aficionado (I do not smoke, and my cooperation with the writer should not be seen as an endorsement of smoking).

If you would like a copy of one or both of these articles, just e-mail me through my faculty webpage (link in the upper right of this page) and I'll send you a copy.

Happy Thanksgiving to everyone!

Sunday, November 18, 2007

Texas A&M's 7-foot frosh DeAndre Jordan set a Big 12 men's basketball record Saturday night by making his 16th straight shot from the field. What's unusual about the record is that it took four games to reach it. According to the above-linked ESPN.com article:

Jordan hasn't missed a field goal since the first half of Texas A&M's opener against McNeese State. The prized recruit was 6-of-6 on Saturday, made his last two shots against McNeese, was 5-of-5 against Oral Roberts and 3-of-3 in a win over UTEP.

Thus, contrary to the popular image of a player just going wild in a single game, Jordan has methodically been making his shots -- in small quantities each game -- and building his streak incrementally.

The article alludes to a couple of his shots in Saturday's game being layups. Given Jordan's height, I would guess most -- if not all -- of his shots during the streak have been from close range. I can probably track down shot charts of the Aggies' games, but due to the late hour, I'll do that some other time.

Even if the shot attempts have been heavily or exclusively from near the basket, everybody suffers an unlucky bounce off the backboard and rim here and there, so Jordan's run of perfection is certainly to be commended.

Thursday, November 15, 2007

At this moment in time, it seems like pro sports teams in Boston can't lose, whereas those in Miami can't win (the respective NHL teams, excepted).

In the Boston area, the Patriots are 9-0, the Celtics are 7-0, and the World Series champion Red Sox, after falling behind 3-1 to Cleveland in the American League Championship Series, ran off seven straight wins to close out the play-offs.

Down in Miami, meanwhile, the Dolphins are 0-9 and the Heat has gotten off to a 1-7 start.

This morning, I was listening to the ESPN radio show "The Herd" with Colin Cowherd. Colin was giving a commentary about how success stories such as the Patriots and disasters such as the Dolphins don't happen by accident. Regarding the latter, years of poor drafting, a merry-go-round of coaches, and bad management decisions have taken their toll.

Cowherd's commentary reminded me of a book I read a while back, Confidence: How Winning Streaks & Losing Streaks Begin & End. Written by Harvard Business School professor Rosabeth Moss Kanter, the book presents several case studies from corporate America and the sports world (including the Patriots) and argues that success and failure are heavily rooted in organizational culture.

Getting back to the day-to-day sports world, the Miami Heat will take on the Celtics tomorrow night in Boston, so look for each of these teams' respective streaks to continue.

Friday, November 02, 2007

The National Basketball Association season is barely underway, and already we have a new league record for streak shooting, in this case of the cold variety. Playing at Boston, the Washington Wizards went 0-16 on three-point attempts, which according to this ESPN.com article is "an NBA record for most attempts without making one."

Thursday, October 25, 2007

Welcome World Series fans!

The big story of Game 1, to me at least, is the continuing run barrage of the Red Sox. Last night, they scored double-digit runs for the third straight game (post-season game-by-game log):

vs. Cleveland (Game 6) 12-2
vs. Cleveland (Game 7) 11-2
vs. Colorado (Game 1) 13-1 

I did a posting in late August about how the Red Sox had accomplished the extremely rare feat during the regular season of scoring double-digit runs in all games of a four-game series (against the Chicago White Sox). Thus, it seems the Red Sox are now up to their old tricks!

The pitching of Boston's Josh Beckett shouldn't be overlooked, either. As this ESPN.com game summary from last night notes:

Beckett also lowered his career postseason ERA to 1.73, placing him third behind Mariano Rivera (0.77 ERA) and Chrisy Mathewson (1.15 ERA) among pitchers who have thrown at least 70 postseason innings.

Tuesday, October 23, 2007

Welcome to visitors who've found their way here via Carl Bialik's "Numbers Guy" blog for the Wall Street Journal. I invite you to browse through my write-ups and the links section on the right. Feel free to add comments to my postings, if you'd like.

If you have no idea what I'm talking about, click here.

Sunday, October 21, 2007

The Tennessee Titans at Houston Texans game completed earlier this afternoon had two major streakiness story lines. Houston, trailing 32-7 entering the fourth quarter, went on a 29-3 burst in the final period to take a 36-35 lead with 57 seconds remaining. Tennessee moved the ball down the field in the closing moments, however, to set up kicker Rob Bironas for a game-winning 29-yard field goal as time ran out (ESPN.com game recap).

The other streaky element was the "hot foot" of Bironas. His winning kick was his eighth successful field goal of the game, which sets a new NFL record (he had no misses). The yardage distances of the field goals in the order in which they occurred are as follows:

52, 25, 21, 30, 28, 43, 29, 29

Looking at Bironas's career statistics from various distances (which appear to be from before the Houston game, given that shortly after the game, his distance-specific stats for this season hadn't been updated, so I would doubt his career ones had been), they are as follows (career stats offer a bigger sample size than just those from 2007):

20-29 yards 21/23 (.91)
30-39 yards 18/19 (.95)
40-49 yards 11/18 (.61)
50+ yards 3/7 (.43)

To estimate the probability of Bironas's making all eight field-goal attempts he took, given that he would be receiving these opportunities, we multiply the component probabilities together:

(.43) (.91) (.91) (.95) (.91) (.61) (.91) (.91)

which yields .155. If we also factored in the likelihood of an NFL team having so many drives stall in fairly close proximity to the goal line, the probability of Bironas's accomplishment would probably get even smaller.

A couple of cautions are in order about this analysis. First, it is the unusual nature of the feat (or in this case, foot) that drew me to conduct the analysis; I did not seek a random cross-section of games. Second, the equation I used assumes independence of observations, that the outcome of any one kick had no impact on the next.

The independence assumption is typically associated with sequences of coin flips and dice rollings, which unlike humans, cannot experience momentum and other associated psychological states. However, having conducted numerous analyses over the years for this website, I consider the independence assumption to hold pretty well for athletic performances, too.

As for Houston's team-comeback element, which unfortunately for Texans' fans did not hold up, I would direct you to my statistical analysis of a relatively recent, similar comeback by Texas Tech (where I'm on the faculty) against Minnesota in last December's Insight Bowl.

Monday, October 15, 2007

The Colorado Rockies have just swept the Arizona Diamondbacks in the National League Championship Series, four games to none. Building upon a sweep of Philadelphia in the opening round (which has a three-out-of-five format), Colorado is 7-0 in the post-season and, factoring in the close of the regular season, has won an amazing 21 of its last 22 games.

Since the advent of the three-round/wild-card play-off system in 1995, the most dominant post-season performance by a World Series champion is shared by the 2005 Chicago White Sox and 1999 New York Yankees, each with an 11-1 record. The Rockies will thus seek to become the first team to go 11-0.

The baseball media naturally have been abuzz with Rocky talk, including comparisons to other hot teams down the stretch in baseball history. Another team I heard about tonight was the 1977 edition of the Kansas City Royals. From August 31 to both games of a September 25 double-header, inclusive, the Royals won 24 out of 25. Ultimately, however, Kansas City lost a heartbreaking American League Championship Series to the Yankees.

Friday, October 12, 2007

Just a few brief notes on the Major League Baseball play-offs:

With their opening-game win over Arizona in the National League Championship Series, the Colorado Rockies have now won 18 of their last 19 games (which also includes a three-game sweep over Philadelphia in the opening round). A couple of entries down, I conducted an elaborate analysis of the Rockies in the regular season.

A streak-within-a-streak is that Colorado pitcher Jeff Francis improved to 5-0 lifetime at the Diamondbacks' Chase Field. I was pleased to see, via this game article, that Francis appears to have some statistical savvy:

"I really can't explain that," said Francis. "It's just a small sample size of me not being here that long and just having a good run against one particular team."

Over in the American League, the championship series between Cleveland and Boston starts tonight. As pointed out in an ESPNews graphic on television yesterday, Cleveland hit .444 (12-27) in two-out situations with runners in scoring position (RISP) in its opening-round win over the New York Yankees.

Tuesday, October 09, 2007

Alex Rodriguez's streaky stretches, both hot and cold, have been chronicled on this blog. With the Yankees' elimination from this year's MLB play-offs at the hands of Cleveland last night, here's an accounting of his post-season woes from a Yahoo! Sports article...

He is mired in an 8-for-59 (.136) playoff spiral dating to his Game 4 home run against Boston in the 2004 ALCS.

New York's biggest bopper is hitless in his last 18 playoff at-bats with runners in scoring position.

Rodriguez hit a solo homer [in the finale of the Cleveland series]... ending a streak of 57 postseason at-bats without an RBI...

Hitless in his last 27 postseason at-bats with any runners on base, A-Rod is certain to again face some criticism after his up-and-down postseason.

Tuesday, October 02, 2007

With last night's exciting, extra-inning, come-from-behind win over San Diego in the National League one-game tie-breaker for the wild-card play-off slot, the Colorado Rockies are riding a hot streak (14 wins in their last 15 games) into the first round of the post-season (Rockies' game-by-game log).

Colorado's first sign of streakiness this season came when it won 7 straight in late May after starting out 18-27. I have plotted a graph of the Rockies' cumulative winning percentage after each game, starting with the 7-game winning streak, as shown below (you can click on the graphic to enlarge it). The late ending of the Colorado-San Diego game, plus all the little embellishments I added to the chart, kept me up until 2:00 AM last night!


As it says in the caption, the Rockies' last 118 games of the season included a combination of streaks (both hot and cold) and relatively steady, incremental gains.

A statistical technique that's appropriate in this context is the runs test. A "run" is a stretch of all wins (without interruption by a loss) or all losses (uninterrupted by a win). The following hypothetical sequence [WWLWWWLLL] includes four runs.

Given that streakiness entails winning (or losing) games in bunches, and not merely alternating wins and losses, evidence for streakiness would come in the form of a team exhibiting fewer runs than would be expected by chance. During their last 118 games of the season (the part I'm focusing on), the Rockies indeed exhibited fewer runs (55) than would be expected (57), but the difference is not very large.

A lot of teams (or individual players, when it comes to hitting or pitching) appear to be streaky performers. However, finding statistical evidence for such is more difficult than many fans would imagine.

For an earlier example of the runs test, where I went into greater detail, click here.

The Rockies' first-round opponent, the Phillies, have exhibited hot play, too, of late, though not quite as dramatically, closing out the season 13-4 (log). If both teams continue their hot offense, the scoreboard operators should get a real workout!

Thursday, September 27, 2007

A couple days ago, Sports Illustrated's Tom Verducci published a column on "Debunking the biggest myths of MLB's wild-card era" (which I learned about via the ESPN radio show, The Herd). Myth No. 2 was that, "The 'hot' teams -- the ones that play well down the stretch -- are the ones to fear in the postseason." Take a look at Verducci's evidence by clicking here.

Tuesday, September 25, 2007



This upcoming Saturday, September 29 will mark the 20th anniversary of a major article in the St. Louis Post-Dispatch's Science section, on whether there was any evidence of streakiness -- either in wins and losses, or in batting performance -- in the city's beloved baseball club, the Cardinals. The article was written by Charles Franklin, then a relatively new professor at St. Louis's Washington University.

My connection to Dr. Franklin -- including a span of 22 years between any in-person contact -- and how I obtained the images of his article interspersed throughout this write-up make for an interesting story, if I do say so myself. (By the way, you can click on any of the images to enlarge them and be able to read them more easily.)

As with many developments in my life, it all starts with the University of Michigan. During the summer of 1985, after I had completed my first year of social psychology grad school at UM, I took a statistics course (linear models) through the university's ICPSR program.

The instructor of that course was the aforementioned Charles Franklin, who had just completed (or was just completing) his Ph.D. in political science at Michigan and had come back from Wash U to teach the summer class.

After that class, roughly 20 years passed without Charles's and my paths crossing in any way. In 1992, Charles moved to the Univesity of Wisconsin, Madison. Then, in 2005, he founded a blog called Political Arithmetik (yes, it ends with a "k"), which is devoted to quantitative expositions on public-opinion data.


Armed with his palette of graphing software, Charles might track, for example, presidential job-approval ratings over time, or systematic differences between survey firms in whether their polls tend to give higher or lower job-approval readings than other firms (known as "house effects"). Charles now also grinds out his analyses for the website Pollster.com, in collaboration with Mark Blumenthal, himself a Michigan undergraduate alumnus.

I don't remember exactly when I first discovered Charles's blog, but once I did, I e-mailed him about being in his class in 1985, and I've submitted comments on his postings from time to time.

This past summer 2007, I was fortunate enough to get the opportunity to teach a course at Wisconsin-Madison, as a visitor in human development and family studies (the same department I'm in at Texas Tech for my regular, full-time job). Here are some photos from my time in Madison.

Once I knew that I would be going up to Madison for a summer term, I contacted Charles about getting together, which would be our first visit in 22 years. He was agreeable, so we met in his office, just north of the campus's famous Bascom Hill. Charles told me that he had just returned from teaching in the Michigan summer stats program, and that he was calling it quits after 25 summers in Ann Arbor.

We chatted about Michigan, statistics, polling, and blogging, the latter of which led to my mentioning the Hot Hand page. As if we didn't have enough connections between Michigan and all the statistical stuff, Charles then told me about his 1987 Cardinal streakiness article for the St. Louis Post-Dispatch, of which I was completely unaware.


He didn't have any copies around. However, compounding our coincidences in a manner worthy of a Seinfeld episode, I was heading to St. Louis over an upcoming weekend to attend the annual SABR conference, and it seemed likely I could find a microfilm of Charles's article at the downtown St. Louis public library.

I, indeed, found the microfilm of Charles' article, and you're now seeing some excerpts of my discovery. The staff members in the microfilm room were extremely helpful, for which I thank them.

As you can glean from the inserted newspaper images, Charles didn't find any evidence of streakiness on the part of the Cardinals.

Thursday, September 20, 2007

Matt Holliday of the Colorado Rockies is currently on a home-run explosion, having hit 11 in his last 12 games. Holliday is known for the gaudy distances of some of his homers, as immortalized in this 2006 blast I found on YouTube.

It took Holliday until September 2 to get his 25th homer of the 2007 season. He's now, of course, up to 36 homers, a 44% increase from when he was at 25 (11/25) in less than three weeks.

Holliday's streak has prompted me to seek out other similar ones.

The Yankees' Alex Rodriguez, whose tendencies to hit homers in bunches I analyzed in an earlier posting, began the 2007 season by hitting 12 homers in 15 games.

Another seemingly good place to look was at players who had set (or come close to) single-season records. During Barry Bonds's 73-homer season in 2001, his most scorching stretch appears to have taken place from May 17-22, during which he hit 9 homers in 6 games (game-by-game log).

Mark McGwire and Sammy Sosa in 1998 also seemed worth looking at. A few years ago, I found a copy of Race for the Record: The Great Home Run Chase of 1998 (a fancy magazine-type volume with side-binding) on sale for $2.99, so I was able to consult the charts within. In the eight games from May 18-25, McGwire had 9 homers. Sosa, of course, had the 20-homer month of June; at his hottest during that month, he hit 11 dingers from June 15-25.

By focusing only on big-name home-run hitters in the last decade, I'm sure I'm missing other great homer binges. I invite readers to add other big homer stretches (with documentation please) via the Comments link, below.

Wednesday, September 12, 2007

Pete Ridges just sent a message to the SABR e-mail discussion list, pointing out that this past Sunday, while playing at Cincinnati, Milwaukee became the first team in Major League Baseball history to start off a game by hitting three consecutive home runs (box score and play-by-play sheet)

Ridges offered the opinion that:

Unusually, some reports have undersold this, by saying that the Brewers were the third team to start their first inning with 3 HR. However, the other two cases came in the bottom of the first...

The other trifectas were by San Diego in 1987 and Atlanta in 2003.

Offensively, of course, only a visiting team can start off a game. From this perspective, Milwaukee's feat is technically unique. However, for a home team to lead off its half of the first inning with three straight homers is pretty darn impressive, too.

Whether any given reader considers the Brewers to be in a class by themselves or to share the record with two other teams, Ridges's conclusion helps put everything in context:

By my addition there had been 188,835 major league games through Sunday, so I was extremely impressed by this.

Sunday, September 02, 2007

Charlotte (NC) Independence High School has just had its 109-game football winning streak come to an end -- but it took an out-of-state opponent to do it.

As part of former Ohio State quarterback Kirk Herbstreit's Ohio vs. The USA Challenge, Charlotte Independence ventured to take on Cincinnati Elder in the latter's home city, and dropped a 41-34 overtime decision.

As noted in the above-linked ESPN.com article, "Most of the wins weren't close. Independence had beaten opponents during its win streak by an average of nearly 35 points per game entering the 2007 season."

Independence's situation appears to fit a very simple "theory" of super-long streaks. A team (or individual) is physically superior to its competition, thus winning most of its games in dominant fashion. Then, in the rare circumstance of a tight game, the team with the winning streak benefits from good luck to keep the streak going, until the luck runs out.

One recent memorable example, from college football, was USC's 2005 win at Notre Dame to extend the Trojans' winning streak to 28, a victory that required some favorable bounces of the ball at the end.

When one thinks of other historical streaks, such as Joe DiMaggio's getting a hit in 56 straight games, the UCLA men's basketball team winning 88 straight games, and Tiger Woods making the cut at 142 straight PGA golf tournaments, it should not be surprising that the teams and individuals who accumulated these streaks were already at the top of their crafts.

Sunday, August 26, 2007

It was a Sox vs. Sox weekend, with Boston visiting Chicago for a Friday doubleheader and single games Saturday and Sunday. In the end, one team did a lot more "socking" of the ball than the other, as revealed in the following scores:

Red Sox 11, White Sox 3
Red Sox 10, White Sox 1
Red Sox 14, White Sox 2
Red Sox 11, White Sox 1

According to the Sunday game article, for a team to put up double-digit run totals in each game represented:

...only the fourth time that has happened in a four-game series since 1900, according to the Elias Sports Bureau. It's the first time it has happened in the American League in 85 years.

Thursday, August 23, 2007

Here are a couple of noteworthy hot-hand phenomena from last night's baseball action:

For the first story, the ESPN.com article says it all:

The Texas Rangers... became the first team in 110 years to score 30 runs in a game, setting an American League record Wednesday in a 30-3 rout of the Baltimore Orioles.

Elsewhere, a first-inning Milwaukee run ended Arizona pitcher Brandon Webb's consecutive scoreless innings streak at 42. Webb had been within reasonable striking distance of former L.A. Dodger Orel Hershiser's record of putting zeroes on the scoreboard for the opponents' inning-by-inning run counts for 59 straight frames, in 1988. Hershiser himself had edged out another Dodger great, Don Drysdale, who had blanked opponents for 58 2/3 innings in 1968.

A nice compilation of statistical data on pitchers' scoreless-inning streaks is available here.

Friday, August 10, 2007

Chicago White Sox closer Bobby Jenks tied a league record tonight for cumulative batters consecutively retired. According to this ESPN.com article, Jenks "has retired 38 straight batters, tying David Wells' American League record set in 1998 with the New York Yankees. It's the fourth-longest streak in major league history."

Update 1: The streak is now at 41 straight batters retired, tying the major-league record.

Update 2: Brought in to close out the ninth inning of the White Sox' August 20 contest against Kansas City, Jenks was greeted with a lead-off single by the Royals' Joey Gathright (article).

Jenks thus joins -- but doesn't exceed -- former San Francisco Giant pitcher Jim Barr in retiring a major-league record 41 consecutive batters.

Monday, August 06, 2007

In baseball action tonight, the St. Louis Cardinals tied a major-league record by getting hits in 10 straight at-bats (official at-bats, that is, as one batter walked in between the first eight and last two hits of the streak).

Much of the oddity centered around St. Louis starting pitcher Braden Looper, a converted reliever (I mention that, as relief pitchers would probably have among the fewest at-bats of any National League players and thus little opportunity to gain hitting experience).

For one thing, Cardinal manager Tony LaRussa had Looper batting eighth in the order, a ploy LaRussa tries with his pitcher from time to time. And, more amazingly still, Looper (who, as the above-linked game article noted, "began the game batting .161"), got two of the hits in the Cards' barrage (one of them a bunt single).

The comprehensive, batter-by-batter play-by-play sheet from ESPN.com is available here; your attention should be directed to the Cardinals' at-bats in the bottom of the fifth inning.

Due to the lateness of the hour (1:37 AM Central), I won't attempt any statistical analyses at the moment. I'll probably revisit the matter, though.

Update: One basic kind of analysis that can be done is to take the pre-August 6 batting average for each hitter who took part in the streak and multiply these together to obtain the overall probability of the Cardinals' accomplishing what they did.

As an analogy, if one wants to know the probability of rolling double-sixes with a pair of dice, one multiplies the chances of a six on each die together, (1/6) X (1/6), to obtain 1/36. This multiplication procedure assumes independence of events (i.e., no effect of one event on the other), an assumption that seems to work pretty well for athletic performance data.

Here are the St. Louis hitters who got at least one hit in the streak, along with the type of hit(s), and their batting averages prior to August 6 (for the position players, these averages are taken from the August 5 box score):

Looper (2: single, bunt single) .161 (use twice in multiplication)
Miles (2: infield single, single) .283 (use twice in multiplication)
Eckstein (single) .286
Taguchi (single) .298 (didn't play August 5, so taken from August 4)
Pujols (single) .316
Encarnacion (single) .289
Rolen (homer) .270
Ludwick (homer) .251

Thus, we're left with:

.161 X .283 X .286 X .298 X .316 X .289 X .270 X .251 X .161 X .283 = .000001

which, as an estimate at least, is 1 in a million.

Of course, each time a player makes an out in a game, his team has a new chance to start a hitting streak. Taking into account the large number of games each team plays in a year and the even larger number of outs it makes, those million opportunities probably come up every several years. Indeed, as alluded to above, there are other teams who share the record of 10 straight hits with the Cardinals.

Thursday, August 02, 2007

A month ago, I conducted runs-test statistical analyses of Alex Rodriguez's alleged tendency to hit homers in bunches. I concluded that there was "very modest evidence" of A-Rod's "being a streaky home-run hitter..."

In the short time since that write-up, Rodriguez has unveiled a new batting stretch -- this time of the cold variety -- to further his credentials as a streaky hitter.

As reported in the ESPN.com article on this afternoon's Yankees loss to the White Sox, A-Rod “ended a career-high hitless streak at 22 at-bats when he singled in the second” (the grey summary box above the article refers to an “0-21 skid,” but I believe 22 is the correct number of at-bats).

His pre-slump batting average was .312 (116/372). This translates into a pre-slump failure rate = 1 - .312 = .688. Raising the latter figure to the 22nd power (for 22 straight at-bats) yields a probability of .0003 (3-in-10,000) of A-Rod having such a drought.

In a bizarre coincidence, the statistical figures of Rodriguez's cold stretch almost exactly parallel those of a 2005 slump by Ichiro Suzuki. As I previously reported:

Seattle’s Ichiro Suzuki, who in 2004 set the single-season record for most hits, suffered through a 0-for-22 slump (longest of his career) in early August 2005. The mid-season Sports Weekly listed him as batting .311 (a failure rate of .689), so the probability of Ichiro’s going hitless in 22 straight official at-bats is .689^22 = .0003.

For the Ichiro analysis, I didn't have his batting average at the exact moment before his slump; I therefore used his average at (roughly) the halfway point of the 2005 season, which would have been a few weeks before his cold spell.

Going back to today's Yankee-White Sox game, another numerical oddity was that, after a scoreless first inning, Chicago scored eight runs in the top of the second, only to have New York put up eight of its own to tie the game. The Yankees clearly would have seemed to have the momentum, but in fact, the White Sox dominated the rest of the way, winning 13-9 (see above-linked article).

Sunday, July 22, 2007

Today's finish to golf's British Open (or "The Open" as the hosts call it) will probably be remembered primarily for the play of Padraig Harrington and Sergio Garcia on the 18th hole in regulation and then the four-hole play-off between the two, won by Harrington.

For sheer hot streaks, though, the Sunday round of Andres Romero, the third-place finisher, would be hard to top. He made 10 birdies for the day, including a stretch of 6-out-of-7 holes in the latter half of the round.

As pointed out during the ABC television broadcast (and can be seen on Romero's scorecard), he had bested par only nine times during the first three days (54 holes) of the tournament (8 birdies and 1 eagle).

Statistical tests on one athlete in one event are always dicey because of the relatively small sample size. However, with the ready availability of online statistical calculators -- in this case, for chi-square -- let's go for it!

We start with a basic 2 X 2 contingency table, with the values referring to numbers of holes (the dashes have been inserted to make sure the spacing comes out right):

----------Below par-----Par or above
Day 1-3--------9------------45------
Day 4---------10-------------8------

The calculator site I used offers three different versions of the chi-square test. Regardless of which one is used, the obtained difference in Romero's percentage of below-par holes between the first three days and the final day would be expected to come up purely by chance less than .005 of the time (5 in 1,000 or 1 in 200). We thus conclude that he performed significantly better on Sunday than during the three previous days.

Of course, the usual cautions apply: I was drawn to doing this analysis by the unusual nature of Romero's spectacular round, I did not test a random cross-section of golfers, and in the aggregate "big picture" of all golfers in all major tournaments, a round like his may not occur any more often than would be expected by chance.

Saturday, July 21, 2007

The Kansas City Royals have been playing some good baseball of late, at least relative to what we'd expect from recent years' incarnations of the team. According to a blog that follows the Royals:

Their June record of 15-12 is their first winning month since July of 2003... realize this team went 22 months with a sub-.500 record. Incredible.

Further, the Royals are 8-6 thus far in July, despite playing against some of the top teams in the American League recently. Here are KC's 2007 game-by-game logs from ESPN.com (first half of season, second half).

Friday, July 20, 2007

With their 6-2 victory over the Arizona Diamondbacks this afternoon, the Chicago Cubs have now won 19 of their last 24 games. An ESPN television graphic showed that the Cubs have steadily increased their month-specific winning percentages from April to July (thus far in the month). As of about a month ago, the Cubs were 32-39 (.451). Their game-by-game log for this season is available here.

Saturday, July 14, 2007

Just a couple of brief items:

The Chicago Cubs ended their 10-game drought without a home run, as Alfonso Soriano belted one today in the North Siders' 9-3 romp over Houston. Chicago did win 6 of the 10 games, though. It was the Cubs' longest homer-free stretch since 1988...

In the kind of start any golfer would dream about, In-Kyung Kim birdied her first seven holes in today's round of the LPGA's Jamie Farr Classic. Kim was quoted in the linked article as follows:

"That was my first time to birdie seven holes in a row, so that was pretty cool," she said. "I kept making birdies. I thought maybe I could shoot 57 today! I made seven birdies in a row, like, what's going on?"

Despite the early hot putter, Kim is in third place heading into the final round, five strokes behind leader Se Ri Pak.

Sunday, July 08, 2007

Earlier today, tennis great Roger Federer won his fifth straight Wimbledon men's singles title, defeating Rafael Nadal in five sets.

With the win, Federer tied Bjorn Borg for the modern record of five consecutive men's singles titles; Borg was in attendance to witness the match. The all-time record is six, held by William Renshaw (1881-1886).

Whereas the five straight titles might be viewed as a "macro" streak, Federer also came up with a key "micro" streak in the fifth set to transform a tight situation in which the momentum seemed to be going against him into a set (and match) that he won going away. As NBC announcer Ted Robinson noted, Federer was able to "flip the switch" to raise his game to a higher intensity.

Specifically, serving at 2-2 in the fifth set, Federer trailed 15-40. He then hit two service winners (balls that Nadal was able to get a racquet on, but not send back over the net in fair territory) to erase the break points, and went on to win the next two points (four in a row, all told) to hold serve for 3-2.

Federer then broke Nadal -- for only the second time in the match -- by winning four out of five points. Federer then held at love to increase his lead to 5-2. Thus, during this stretch, Federer won 12 out of 13 points!

Federer then won again on Nadal's serve, in a lengthy game that went to deuce a few times, to prevail 6-2 in the fifth.

Tennis is one of the few sports in which an academic statistical study has found evidence of streakiness (non-independence) of winning points. For further information, see the following article, available via Franc Klaassen's faculty webpage.

Klaassen, F.J.G.M. & Magnus, J.R. (2001). Are points in tennis independent and identically distributed? Evidence from a dynamic binary panel data model. Journal of the American Statistical Association, 96, 500-509.

Wednesday, July 04, 2007

Having reached (roughly) the halfway point of the baseball season, with the All-Star Game this coming Tuesday, now seems like a good time to reflect on the home run-hitting performance of the Yankees' Alex Rodriguez this season. As of this writing, A-Rod leads all of Major League Baseball with 28 homers, one ahead of National League leader Prince Fielder and eight ahead of the nearest American League rival, Justin Morneau (ESPN.com MLB stats page).

It's not merely the number of homers hit by Rodriguez, but also the seemingly clustered nature of his blasts. Thus far this season (and, as I've come to learn, in earlier years), he has seemed to hit home runs in bunches, separated by pronounced cold streaks where he's been unable to "touch 'em all."

In preparing to do this write-up, I did a lot of web-searching on Rodriguez and his home run-hitting prowess. In the process, I found a spectacular visual display of A-Rod's sequences of home-run and non-home-run games, not just for 2007, but for his entire career (done by Ryan Armbrust at a blog called "The Pastime"). One apparent typo is that the year labeled "1995" in the display is really 1996 (compare with Rodriguez's career stats).

With reference to his visual display, Armbrust writes of A-Rod, "He’s been a streaky home run hitter his entire career, as shown by the sparklines below."

However, as social psychologist David Myers notes in his book Intuition: Its Powers and Perils, "Random sequences seldom look random, because they contain more streaks than people expect" (p. 134). Any interested readers of this blog can demonstrate this for themselves by following Myers's example and flipping coins for a while. Every so often, you'll get streaks of several heads or several tails in a row.

A statistical test to determine if A-Rod's sequences of games with and without at least one home run are more bunched into homogeneous segments than would be expected by chance, is thus warranted.

One such approach is the runs test (here, here, and here). Where each trial can have two possible outcomes, such as each baseball game played by Rodriguez either including a homer by him (depicted in red in Armbrust's figures) or not including one (gray in the figures), a run is defined as any streak of consistently the same outcome (all reds in a row, or all grays). We are thus using the term "run" in a particular statistical context and not in regard to how many "runs" a team scores. Also, for present purposes, we are ignoring the distinction between games in which A-Rod has hit 1, 2, or 3 homers -- all are subsumed within the category "1 or more."

If, instead of colors, we use the code number 1 to represent a game with at least one A-Rod homer, and the number 0 to represent a game with no homers by him, we will have various sequences of 1's and 0's.

The key to the runs test is that streakiness is signified by few runs (such as 11110000, which contains two runs), whereas absence of streakiness is signified by many runs (such as 10100101, which contains seven runs).

For any given sequence, we can calculate how many runs would be expected by chance. Then, if the actual number of runs in a sequence turns out to be significantly smaller than expected, we can claim streakiness.

As a simple example, let's say we have a four-trial sequence consisting of two 1's and two 0's, in some order. There are six possible such sequences (those familiar with the n-choose-k principle can think of the problem as 4-choose-2, as we are choosing in which two of the four positions the 1's [or the 0's] would be located).

1100 (2 runs)
1010 (4 runs)
1001 (3 runs)
0011 (2 runs)
0101 (4 runs)
0110 (3 runs)

If we average the number of runs over all six possible sequences, we get 3 as the expected number of runs (18/6).

A simple formula for expected number of runs is 2 X (number of trials with a 1) X (number of trials with a 0), divided by the total number of trials, with 1 added to the previous answer. For the above example, expected runs = (2 X 2 X 2)/4 = 2, plus 1 = 3, matching the above answer. In my table below, I round the expected runs values to the nearest whole number or, if close to ending in .5, to the nearest half-number.

Another resource we can use is an online runs-test calculator, into which we can type in 1's and 0's and, at the click of a mouse, find out if our sequence deviates significantly from expectation (in order for a result to be "statistically significant," by convention we say that there must be a .05 [1-in-20] or smaller probability of the obtained result being due to chance).

Below are the results of my application of Rodriguez's data (from The Pastime, except for a couple of months in 2007, which I gleaned myself) to the runs test. Another point worth noting is that the online runs-test calculator is limited to 80 cases of data. Accordingly, I did hand calculations of A-Rod's actual (observed) and expected runs for both the first 80 games and all games of each season.

With the data from the first 80 games of a given season, I performed a formal runs test only if the actual number of runs was below the expected value (shown in bold), as I wasn't interested in testing if A-Rod was ever less streaky than expected. Then, if it appeared that his actual number of runs for a full season might be substantially lower than the expected value, I also performed a runs test for games 81 and beyond in that season (in cases where he played 161 or 162 games in a season, I used his last 80 games, leaving out the 1 or 2 in the middle of the season). Here are the results...

1996 (shown on The Pastime as 1995)
First 80 games: 35 actual runs, 31 expected runs
Full season (146 games): 57 actual runs, 52 expected runs

1997
First 80 games: 21 actual runs, 21 expected runs
Full season (141 games): 41 actual runs, 39 expected runs

1998
First 80 games: 37 actual runs, 34 expected runs
Full season (161 games): 61 actual runs, 60 expected runs

1999
First 80 games: 39 actual runs, 34 expected runs
Full season (129 games): 57 actual runs, 53 expected runs

2000
First 80 games: 27 actual runs, 30 expected runs (p = .18)
Full season (148 games): 48 actual runs, 56.5 expected runs (for final 68 games, p = .02)

2001
First 80 games: 31 actual runs, 30 expected runs
Full season (162 games): 75 actual runs, 68 expected runs

2002
First 80 games: 29 actual runs, 31 expected runs (p = .27)
Full season (162 games): 65 actual runs, 67 expected runs (difference of 2, though in direction of streakiness, still small)

2003
First 80 games: 32 actual runs, 29 expected runs
Full season (161 games): 62 actual runs, 65 expected runs (reversal of trend from first 80 games is noteworthy; for last 80 games, p = .08)

2004
First 80 games: 32 actual runs, 29 expected runs
Full season (155 games): 55 actual runs, 53 expected runs

2005
First 80 games: 25 actual runs, 25 expected runs
Full season (162 games): 63 actual runs, 64 expected runs (difference of 1, though in direction of streakiness, still small)

2006
First 80 games: 24 actual runs, 28 expected runs (p = .10)
Full season (154 games): 49 actual runs, 50.5 expected runs (difference of 1.5, though in direction of streakiness, still small)

2007 (through 80 games)
First 80 games: 30 actual runs, 35 expected runs (p = .08)

One finding that initially jumps out at me is that A-Rod has been as (or more) likely to exhibit a greater number of homogeneous runs than expected by chance (the opposite of streakiness) in a season, as fewer runs. Overall, I would say there's some very modest evidence of Alex Rodriguez being a streaky home-run hitter, whose dingers tend to come in bunches. But to a large extent, the bunches we see in the visual depictions tend to be the result of randomness.

Once again, I would like to express my appreciation to Ryan Armbrust, whose diagrams of A-Rod's home-run sequences saved me a lot of work!

Tuesday, July 03, 2007

Red Sox shortstop Julio Lugo ended his cold stretch of 33 straight at-bats without a hit, singling in the second inning of tonight's game against Tampa Bay. Lugo then added another single in the seventh, for good measure. Although he was not all that close to Bill Bergen's record (for a non-pitcher) of 46 consecutive hitless at-bats, set nearly 100 years ago, Lugo was coming under increasing scrutiny from Boston fans and baseball statheads.

Tuesday, June 26, 2007

I just finished reading the book Stumbling on Happiness by Harvard social psychologist Daniel Gilbert.

The premise of the book is that we tend not to be very good at predicting how we would react emotionally, if a given event were to occur in the future. For example, if someone were asked how he or she would feel if his or her favorite sports team were to win a championship, the person's estimated happiness would likely exceed his or her actual happiness if the team actually won a championship and you could survey the fan a while afterwards. Gilbert and his University of Virginia colleague Tim Wilson refer to this area of research as "affective forecasting."

I did not start reading the book with hot hand research in mind. However, Gilbert's chapter on “presentism” really seemed to fit what may be going on with hot hand perceptions. The meaning of presentism can be grasped via a couple of Gilbert quotes:

...when brains plug holes in their conceptualizations of yesterday and tomorrow, they tend to use a material called today...(p. 125).

...if the present lightly colors our remembered pasts, it thoroughly infuses our imagined futures... (p. 127).

To use basketball as an illustration, presentism can be applied to hot hand perceptions as follows. An observer sees a player make several shots in a row (present) and naturally expects a high likelihood of the player making his or her next several shots (future). Such expectations presumably would be what give rise to the thinking that a team should always pass the ball to a hot shooter (it actually may be beneficial to pass to a hot shooter, but only because good overall shooters are the ones most likely to go on a streak).

Jay Koehler and Caryn Conley (2003) published a study a few years ago that content analyzed TV announcer comments during NBA three-point shooting contests, held annually in conjunction with the all-star game. The main finding was that players’ shooting percentages immediately following TV announcers’ hot hand exclamations (e.g., “Legler is on fire”) were no different than their overall baseline shooting percentages, thus showing once again that the present was not predictive of the future. This paper can be obtained via Koehler's website, which is itself accessible through the links to other researchers' pages on the right-hand side of the present page.

Saturday, June 23, 2007

Friday night, the Atlanta Braves suffered their third shutout in a row, falling to the Detroit Tigers, 5-0. According to this ESPN.com recap (in the box entitled, "A Closer Look"):

The Braves were shut out for the third game in a row, their scoreless streak stretching to 28 innings. It was the first time since 1988 that Atlanta has been the victim of three straight shutouts.

Atlanta finally did score a run this afternoon (in the fourth) in losing 2-1 to Detroit, but that won't affect my analyses of the probability of a three-game scoreless streak.

So, how rare is it for a team to record what might be called a "Paula Abdul" trio of games -- my reference to her 1989 song Straight Up, which includes the phrase, "Oh-Oh-Oh..."?

I present two analyses, one historical and the other statistical.

First, for the historical analysis, I inspected every team's game-by-game logs (via ESPN.com's schedule/results page for each team, which can be accessed through here) for what's been played so far in 2007 and for all of 2006.

I found no instances of a team being shut out three games in a row during this (roughly) season-and-a-half span. Several teams came close, as seen in the following examples:

The Royals had a sequence during the first half of the 2006 season that went like this: May 31 at Oakland, lose 7-0; June 2 at Seattle, lose 4-0; and June 3 at Seattle, lose 12-1.

The Twins, during an April 28-30, 2006 sequence at Detroit, lost three games by the scores of 9-0, 18-1, and 6-0.

Later in the same season (August 10-12, 2006), Minnesota had the following losing stretch while hosting Toronto: 5-0, 7-1, and 4-0.

From May 2-5, 2006, the Cubs lost four straight games while scoring a total of only one run. The opponents and scores of the Cub losses are as follows: vs. Pittsburgh, 8-0; at Arizona, 5-1 and 6-0; and at San Diego, 1-0.

Finally, the Astros, on the dates of June 27, 28, and 30, 2006, lost at Detroit by scores of 4-0 and 5-0, and then at Texas, 3-1.

For the statistical analysis, I looked at every Braves' box score for their 20 games preceding the streak of three shutout losses. For each inning, 1 through 9, of every game, I recorded in a yes/no fashion whether Atlanta had scored one-or-more runs or not.

As David W. Smith shows in the recent SABR Baseball Research Journal (Volume 35) ("Effect of Batting Order..."), average numbers of runs scored tend to be higher in some innings than in others (e.g., more tend to be scored in the first inning). Therefore, I wanted to examine the Braves' probabilities of scoring (and not scoring) on an inning-by-inning basis, even though number of runs and scoring of at least one run are different quantities.

Listed below are the numbers of times the Braves scored one-or-more runs in each inning during the 20 games I coded. Each value was then subtracted from 20 to give the number of games in which the Braves were blanked in that inning, which was then converted to a percentage (number of games blanked in that inning, divided by 20).

Inning---#Games ATL Scored---% of Games NOT Scoring

First---4 games---.80 rate of NOT scoring in this inning
Second---4 games---.80
Third---5 games---.75

Fourth---5 games---.75
Fifth---7 games---.65
Sixth---6 games---.70

Seventh---4 games---.80
Eighth---6 games---.70
Ninth---5 games---.75

Multiplying these nine component probabilities together yields .069 for the overall probability of the Braves being shut out in any one particular game. This calculation assumes independence of scoring probability from inning to inning, which may be questionable (batting orders would typically lead to the batters in one inning being better than those in the next inning). Still, the estimate seems reasonable, as the Braves were shut out only once in the 20-game sample (.05, which is close to the estimate of .069).

Because Atlanta was blanked in three straight games, we then raise .069 to the third power, yielding .0003 (or 3-in-10,000) for the probability of the Braves getting shut out every time in a three-game sequence (again assuming independence).

Given enough opportunities to get shut out three straight games -- and the Braves certain had a huge number of them, going all the way back to 1988 -- it can happen. But in the short term, even across all of Major League Baseball, it seems very rare.

Sunday, June 17, 2007

Brandon Watson of the Columbus Clippers yesterday tied the 95-year-old International League record for consecutive games with at least one hit. Watson's double at Ottawa brought his hitting streak to 42 games. The IL is part of AAA minor-league baseball, the level just below the majors. The major-league affiliate, or parent organization, for Columbus is the Washington Nationals (for many years, it had been the Yankees). Two questions are: (1) how long will the streak continue?, and (2) when will the Nats bring him up to the big club?

Update: Watson's hitting streak came to an end at 43 games, but at least he surpassed the previous league record by one game.

Monday, June 11, 2007

The June issue of the academic journal Personality and Social Psychology Bulletin included an article by Keith Markman and Corey Guenther of Ohio University, entitled "Psychological Momentum: Intuitive Physics and Naive Beliefs."

The article presents a package of four brief studies, each examining college student participants' beliefs and expectations regarding psychological momentum. The studies did not examine the relation between perceived momentum and athletes' actual later performance.

In one study, participants viewed a 10-minute videotape segment of a 1998 men's basketball game between Duke University and the University of North Carolina (which participants reported having no prior familiarity with). During the segment, Duke scored 15 straight points to cut a 19-point deficit to 4. At each of 10 pause points, participants reported their perceptions of which team had the momentum and who was going to win.

Another question asked participants to identify what they felt were turning points. Out of 11 possible plays, respondents disproportionately identified two of them, suggesting the perception of momentum can be a judgment of high consensus.

Other studies, using hypothetical scenarios, examined perceivers' impressions of the impact of a win over a traditional rival on a team's likelihood of winning its next game; the anticipated carryover of momentum on one task to performance on another task; and the anticipated effects of blocking a person's momentum on a task.

A key idea guiding the study was the authors' proposed analogy between psychologial and physical momentum. As the authors discussed, the latter is calculated by multiplying an object's mass by the velocity with which it is traveling (see here for the Wikipedia page on physics momentum). At this stage, the delineation of psychological factors to correspond with mass and velocity seemed a little loose to me (e.g., a win over a longstanding rival was said to confer more mass than a win over a run-of-the-mill opponent).

Through these studies, Markman and Guenther have "gotten the ball rolling" on a potentially fruitful line of research. Whether this "ball" gathers momentum, we'll have to wait and see.

Sunday, June 03, 2007

I don’t write about college softball too often, but two pitchers currently going in the NCAA Women’s College World Series warrant attention from a hot-hand perspective. Tennessee’s Monica Abbott and Washington’s Danielle Lawrie have each pitched a no-hitter thus far in the World Series.

Abbott has not allowed a run in two outings and has struck out 32 batters in 14 innings pitched (regulation length for softball is seven innings).

Lawrie has allowed only one hit in 12 innings (one of her team’s wins was in five innings, a “run-rule” shortened 9-0 victory).

Another pitcher, Arizona’s Taryne Mowatt, has not allowed an earned run in three complete-game appearances.

There will be plenty of softball action today, along with Monday, Tuesday, and perhaps Wednesday nights, on ESPN and ESPN 2, so take a look if you have a chance. Further information on the Women’s College World Series is available at my College Softball Blog.

Saturday, June 02, 2007

Cleveland showed it was more than just a one-man team, with Daniel Gibson scoring 31 points as the Cavaliers closed out the Detroit Pistons, 98-82, to advance to the NBA finals. Gibson went a perfect 5-of-5 on three-point attempts (box score), including a trio of them in the first 2:18 of the fourth quarter, to break open a close game (fourth quarter play-by-play sheet). Having LeBron James on your team to draw the opponents' attention is obviously helpful, but Gibson had to make the shots himself.

Friday, June 01, 2007

LeBron James Takes Over

The sports world is abuzz over last night's performance by Cleveland's LeBron James in his team's double-overtime win at Detroit in the NBA's Eastern Conference finals. It wasn't merely that James scored 48 points in giving his team a 3-2 series lead.

James scored the Cavaliers' final 25 points of the game, 29 of their last 30. Quoting from this game article, "He was the only Cavs player to make a field goal in the last 17:48 and the only one to score in the final 12:49."

This particular type of accomplishment is obviously a function of both James's offensive prowess and his teammates' inability to score. If we look at the game's play-by-play sheet, focusing on the final 17:48 (the last 7:48 of the fourth quarter, excluding a basket by Zydrunas Ilgauskas at the 7:48 mark that starts the clock running on our analysis, and then the two overtimes), we can see the shooting percentages for both James and the non-James Cavaliers (in the aggregate).

During the final 17:48, by my count...

James was 11 of 14 from the field, his made field goals roughly an equal blend of long-distance shots (two made three-pointers and a bunch of long two-pointers) and layups/dunks. He was also 5 of 9 from the free-throw line.

In contrast, the non-James Cavaliers were 0 for 10 from the field and 1 of 2 from the stripe.