Monday, April 23, 2007

Last September, when the L.A. Dodgers hit four consecutive solo homers, it was the first time in 42 years the feat had been accomplished.

Well, it didn't take long for it to happen again. Last night, Boston's Manny Ramirez, J.D. Drew, Mike Lowell and Jason Varitek homered in four straight at-bats, spurring the Red Sox to a victory and series sweep over the New York Yankees.

The outburst by the Red Sox occurred in the third inning, not as crucial a situation as the Dodgers' scenario, which had them trailing by four in the bottom of the ninth against San Diego, with whom the Dodgers were battling for the NL West title. Still, though, the Red Sox-Yankees rivalry is as intense as it gets, and the four straight homers allowed Boston to overcome an early 3-0 New York lead. A couple of trivia items from the above-linked article on the Red Sox-Yankees game:

Drew, a member of the Dodgers last year, participated in both home-run barrages.

Even though a four-homer streak has occurred five times in MLB history, the Yankees' rookie pitcher Chase Wright was only the second to give up all four shots; in the other instances, teams have changed pitchers.

Friday, April 20, 2007

Probably the two biggest stories of this past week in Major League Baseball are New York Yankee Alex Rodriquez's home-run barrage and Chicago White Sox pitcher Mark Buehrle's no-hitter Wednesday night, the first recorded in MLB this season.

Earlier tonight, Rodriguez hit two homers in a loss to the Boston Red Sox. According to this game article:

Rodriguez went 3-for-4 and joined Mike Schmidt, who hit 12 homers in the first 15 games in 1976, as the fastest to reach a dozen in baseball history.

Also, after Thursday's action, Trent McCotter sent in an e-mail to the SABR-L discussion list, pointing out that -- to McCotter's knowledge -- ARod is also closing in on the record for most consecutive games with two or more total bases offensively:

Most consec. games with 2+ total bases in each
19g: Harry Heilmann, DET, Sept 5 through Sept 30, 1928
17g: Chipper Jones, ATL, Jun 26 through Jul 19, 2006
17g: Alex Rodriguez, NYY, Sep 27 [2006] through Apr 19, 2007 (on-going)

After tonight, of course, Rodriguez is now at 18 games.


As noted above, the other highlight of the week was Buehrle's no-hitter against the Texas Rangers.

In 2004's issue of the Baseball Research Journal (Number 33), Bob Kapla published an article entitled, "No-Hitter Probabilities: What Are the Odds?" This article shows how to take three simple quantities from a pitcher's career statistics -- games started, innings pitched (which gets converted into "outs achieved" by being multiplied by three), and hits allowed -- and determine the probability of his throwing a no-hitter, either in any given game or at least once in his career.

Using Buehrle's career statistics through the end of the 2006 season, he had retired 4284 batters and given up 1473 hits (for no-hitters, as opposed to perfect games, we ignore walks, errors, etc.). Buehrle's probability of getting any hitter out (on average) was thus .744. A no-hitter requires retiring 27 straight batters, yielding a probability of .0003 (.744 raised to the 27th power) for Buehrle's likelihood of no-hitting the opposition in any single game.

We can also say that his probability of not throwing a no-hitter in a given start is .9997. Buehrle had started 204 games through the end of 2006, so the likelihood that he would fail to throw a no-hitter in all 204 games in a row is .9997 to the 204th power, yielding .933. In other words, in Buehrle's 204 career starts from 2000-2006 inclusive, there was roughly a 93% chance he would never throw a no-hitter. Conversely, there was roughly a 7% chance (.067) that he would have thrown at least one. And in his third start of 2007 (207th career start), Buehrle got one.

[As an aside, given his career mix of outs recorded and hits allowed, as of the time of publication of Kapla's article, Roger Clemens had a 51.73% chance of throwing at least one career no-hitter, yet had never done so. Here's a list of all no-hitters.]

What lends even more of a "hot arm" aspect to Buehrle's no-hitter is how he closed out his previous outing on the mound. Pitching against Oakland, Buehrle gave up three runs in the first inning, but (including the final Athletic he faced in the first) retired 19 of the 21 batters he pitched to the rest of the way, before leaving after the seventh inning (play-by-play sheet).

Tuesday, April 17, 2007

Roland Beech of the website, which specializes in statistical analyses of NBA data, recently invited me to create an NBA version of my new graphic hot-hand diagram (click here for some background).

When I think of potential hot-hand shooters in the NBA, the list begins and ends with one name, Kobe Bryant. I therefore decided to create a graph for Bryant's 81-point game against Toronto last season. Also, at Roland's suggestion, I found a way to incorporate free throws.

I am honored to find out today that my Bryant chart is the current "Top Story" on 82games, as shown below.

Take a look, by clicking on

I'm sure I speak for the entire sports statistics community in extending our thoughts and condolences to everyone at Virginia Tech and their families. The school has created a special website for information related to the tragedy.

Friday, April 06, 2007

As virtually all fans of U.S. college sports are aware, Florida and Ohio State met in both the football and men's basketball championship games during the current (2006-07) academic year, with the Gators besting the Buckeyes both times. What are the odds of the same two schools appearing in both the football and men's basketball title tilts in the same year?

As with all (or nearly all) of the analyses I conduct on this site, the obtained probability estimate rests on whatever assumptions are made. To examine the question of the same two schools meeting in the championship games of the two major college sports in the same year, I will rely upon three concepts of probability.

The first is the "n choose k" principle (also known as the binomial coefficient). Given n total objects and the task of choosing a subset of k objects (where k is less than n), how many ways are there to draw the k objects?

To make things more concrete and to anticipate our sports analyses, let's think back to when the National Hockey League had only six teams, known fittingly as the Original Six (Boston, Chicago, Detroit, Montreal, New York [Rangers], and Toronto). Now, we ask, how many possible combinations of two teams are there for who could meet in the final round? In other words, what is the answer to the problem of "6 choose 2"?

Using this online calculator, which requires us to insert the expression in the form "ch(6,2)", we get the answer of 15. You can manually list out the possible match-ups if you want to verify there are 15 (Boston-Chicago, Boston-Detroit,... New York-Toronto) or you can just take the calculator's word for it.

Switching back to college football and basketball, we need to determine how many possible combinations there are in football for the two teams that will meet in the championship game (of which Florida-Ohio State is one) and how many combinations there are for the basketball final game (again, of which Florida-Ohio State is one).

Once we've determined these two quantities, then the second major probability concept comes into play, namely the "multiplication/and rule." Quoting from King and Minium's introductory stats book (p. 199), which I use in my teaching:

[T]he probability of several particular events occurring successively or jointly is the product of their separate probabilities (provided that the generating events are independent).

To summarize to this point, we need to estimate the likelihood of each part of the question (i.e., Florida-Ohio State is one of X possible match-ups in football and one of X possible match-ups in basketball) and then multiply the two probabilities together.

The probability of a Florida-Ohio State match-up in football and in basketball almost certainly would not be the same. There are over 300 schools that compete in men's NCAA Division I basketball, whereas the comparable figure for football (known as Division I-A or Football Bowl Subdivision) is somewhat over 100. The difference is that some conferences of schools with relatively small athletic programs compete with the "big boys" of college basketball in the same championship tournament, but not in the upper echelon of college football.

I certainly don't think we should take "300 choose 2" as the number of possible match-ups for the basketball final, as only a fraction of the 300 schools realistically have a chance to make it to the championship game. "Cinderella" teams sometimes upset a powerhouse in the first round, adding to the drama and mystique of "March Madness," but they don't tend to make the final (in 2006, the underdog George Mason University made the Final Four, but not the title game).

I have adopted an arbitrary, yet seemingly reasonable, cut-off for how many Division I men's basketball teams should be in the pool (n) of teams that could possibly make the championship game. From the 2001-2006 NCAA tournaments inclusive, by my standard, a school would have to have won at least one game (i.e., advance to the round of 32) in two or more of the six years.

The number of schools meeting these criteria can be gleaned from another of my websites. I count 49 teams that qualify; let's say 50 to make it a round number. Under my system, Bucknell qualifies as a championship game contender, even though it is unlikely ever to advance to the final (see John Feinstein's book, The Last Amateurs, about Bucknell and its mates in the Patriot League, a review of which is available here). Still, I need to have objective criteria and, if an occasional surprise school gets in, so be it.

We then take "50 choose 2," which equals 1225, for the number of possible final-game match-ups among our 50 viable teams. The probability of a Florida-Ohio State men's basketball final, given equal likelihood among the 50 teams of making the final, is thus 1/1225.

My viability standard for football is that over the same six years (early January 2001 through early January 2006 inclusive), a team needed to play in at least one BCS bowl game (Rose, Orange, Sugar, or Fiesta; a fifth BCS game, known simply as the National Championship Game was added this past season).

By my count, there were 28 such teams; again, to make it a round number, let's say there are 30 teams in the pool. Taking "30 choose 2" gives us 435 possible match-ups for the football championship game among what I've defined as the title-viable teams. There would thus be a 1/435 probability of Florida and Ohio State meeting in the title game, assuming equal likelihood among the 30 teams.

There's a complication affecting the football calculation that does not affect the one for basketball. Specifically, given that even a single loss during the season will often put the kaibosh on a football team's chances of competing for the national title, it is highly unlikely that two teams from the same conference will meet in the national championship game (although this past season, an all-Big 10 match-up of Ohio State and Michigan came close to happening).

Basketball has no such impediment and championship games pitting teams from the same conference have occurred (namely, Indiana vs. Michigan in 1976, Villanova vs. Georgetown in 1985, and Kansas vs. Oklahoma in 1988).

The greatest conference representation within my set of viable football teams belonged to the Big 10 with six teams (Ohio State, Michigan, Penn State, Purdue, Iowa, and Illinois). As we know from the hockey example above, there are 15 possible two-way match-ups with six teams. The Big 12 and Pac 10 each had five teams (each yielding 10 possible intra-conference match-ups), whereas the Atlantic Coast Conference and Southeastern Conference each had four teams (each yielding six possible intra-conference match-ups). All other conferences had two or fewer teams.

Overall, there would be around 50 possible intra-conference match-ups needing to be excluded. As a result, we can adjust the estimated probability of a Florida-Ohio State football championship match-up to 1/385.

Multiplying the probability of a Florida-Ohio State basketball championship match-up (1/1225) times the original estimated probability of these same two teams playing in the football final (1/435) yields roughly 1 in 530,000. Multiplying (1/1225) X (1/385) yields roughly 1 in 470,000.

Either way, there was about a 1 in 500,000 probability of seeing Florida and Ohio State playing in both the football and men's basketball championship games in the same year.

Are we done yet? Not quite.

As I noted above, the probability just calculated was for Florida and Ohio State, per se, to meet in both title games. No offense to Gator and Buckeye fans, but the noteworthy aspect of the football and basketball championships was that they featured the same two teams, not necessarily Florida and Ohio State. If UCLA and Michigan had met in both the football and basketball finals, or Arizona and Oklahoma, or any other particular pair, the underlying phenomenon would have been the same.

This situation is analogous to the distinction between a particular named individual winning the lottery twice and the possibility of someone, somewhere winning it twice, the latter being much more likely than the former (I discussed this in an earlier posting).

When I compared my viable-contenders list for football and men's basketball, I found that 11 schools were on both lists. This would create "11 choose 2" -- which equals 55 -- possible match-ups that could have occurred in both the football and basketball championship games. Perhaps we could have had Texas-West Virginia match-ups in the two title games, or Pittsburgh-Notre Dame, or any of 52 others, in addition to Florida-Ohio State.

We now need to bring in our third concept of probability, the "addition/or rule." Again, from King and Minium (p. 199):

[T]he probability of occurrence of any one of several particular events is the sum of their individual probabilities (provided that they are mutually exclusive).

Here's an analogy: If we roll two dice, a red one and a green one, the probability of rolling double-sixes is 1/36 (via the aformentioned multiplication/and rule). But, if we want to know the probability of rolling any pair of matching numbers (i.e., 1-1, 2-2, 3-3, 4-4, 5-5, or 6-6), then we have to add up the six individual probabilities of 1/36 to arrive at 6/36 or 1/6.

Given that we said earlier that the probability was roughly 1 in 500,000 for any particular pair of schools (in this case Florida and Ohio State) to be in both finals, and that there were roughly 50 pairs of schools who could conceivably play in both finals in the same year, we arrive at 50/500,000 or 1/10,000 for the probability that the same two schools could meet in the football and men's basketball finals in the same year.

I've made a lot of assumptions along the way and have perhaps stumbled somewhere. If you have any comments, corrections, clarifications, etc., please let me (and the sporting world) know by clicking on the "Comments" heading below and leaving a message. To prevent spam, I've imposed some "hoops" to get through, but nothing too prohibitive, I hope. You do not need to establish a Blogger account to comment; you can either type in a name for yourself or post as "Anonymous."

Wednesday, April 04, 2007

I plan to have a final wrap-up of the college basketball season in the next few days. Before getting there, however, here's an item from high school baseball...

Colt Molloy, from a town in west Texas, had his streak of five no-hitters stopped the other day. The five consecutive no-hitters were good enough to establish a new Texas state high school record, but he just missed tying the national record of six.

Here's a link to an Amarillo newspaper article, but completion of a free registration process with the newspaper is required.

Monday, April 02, 2007

I don't really have any mindboggling hot-hand occurrences to report in this entry. Rather, I'm just using it as an excuse to display a couple of photos I took at Boston's TD Banknorth Garden, where I attended a Bruins hockey game against the Atlanta Thrashers last Saturday afternoon, while in town for an academic conference. I never got to the old Boston Garden, and in attempting to see a sporting event at the new Garden (once known as the Fleet Center), I found that the Celtics were out of town during my visit, so hockey was the only game in town.

To make this posting at least somewhat relevant to the purpose of this blog, I decided to keep an eye out at the game for what I considered the best streaky performance of the day. And the winner of this honor is...

Boston goalie Joey MacDonald. After Atlanta scored just 24 seconds into the game, MacDonald shut out the Thrashers for the rest of the first period. This may not seem all that impressive. However, Boston received four penalties in the first period, meaning that for eight minutes, Atlanta was attacking the Boston goal with an extra player. Having faced 16 Thrasher shots in the period (which seems like a lot, relative to the seven Boston fired off), MacDonald stopped the final 15.

In fact, MacDonald's performance Saturday earned him recognition as one of the game's Three Stars.

Enjoy the pictures!

Sunday, April 01, 2007

The Golden State Warriors' Jason Richardson has displayed a couple of pronounced scoring spurts lately. In today's game against Memphis, Richardson scored 11 points in a span of roughly two-and-a-half minutes midway through the fourth quarter to help overcome a Grizzlies' lead and spark the Warriors to victory.

And there was last Thursday night's Golden State game against Phoenix. According to this game article:

The Warriors scored 18 points in the first 3 minutes with four 3-pointers. Golden State had 30 points in the first 5 1/2 minutes, including 16 by Richardson on four 3-pointers and 6-for-6 shooting.

In order to be truly a streaky shooter, someone must go through cold stretches, as well as hot ones. In the statistical jargon, a player's p(hit|hit) [i.e., probability of hitting a shot, given an immediately previous hit] must exceed his or her p(hit|miss) [i.e., probability of hitting a shot, given an immediately previous miss]. Stated differently, a made shot would appear to elevate a player's shooting percentage on the next shot, relative to if he or she had missed the previous shot.

Interestingly, quoting from the above-linked article on the Phoenix game:

Richardson was awful in the Warriors' previous two games, going scoreless in 29 minutes during Golden State's home loss to San Antonio on Monday. Before the game, Warriors coach Don Nelson said Richardson hasn't been fully healthy all season after undergoing knee surgery shortly before training camp.

Further analysis will need to be done, of course, but perhaps Jason Richardson can be found to be a member of that rare species: the statistically established streak shooters.