Friday, April 06, 2007

As virtually all fans of U.S. college sports are aware, Florida and Ohio State met in both the football and men's basketball championship games during the current (2006-07) academic year, with the Gators besting the Buckeyes both times. What are the odds of the same two schools appearing in both the football and men's basketball title tilts in the same year?

As with all (or nearly all) of the analyses I conduct on this site, the obtained probability estimate rests on whatever assumptions are made. To examine the question of the same two schools meeting in the championship games of the two major college sports in the same year, I will rely upon three concepts of probability.

The first is the "n choose k" principle (also known as the binomial coefficient). Given n total objects and the task of choosing a subset of k objects (where k is less than n), how many ways are there to draw the k objects?

To make things more concrete and to anticipate our sports analyses, let's think back to when the National Hockey League had only six teams, known fittingly as the Original Six (Boston, Chicago, Detroit, Montreal, New York [Rangers], and Toronto). Now, we ask, how many possible combinations of two teams are there for who could meet in the final round? In other words, what is the answer to the problem of "6 choose 2"?

Using this online calculator, which requires us to insert the expression in the form "ch(6,2)", we get the answer of 15. You can manually list out the possible match-ups if you want to verify there are 15 (Boston-Chicago, Boston-Detroit,... New York-Toronto) or you can just take the calculator's word for it.

Switching back to college football and basketball, we need to determine how many possible combinations there are in football for the two teams that will meet in the championship game (of which Florida-Ohio State is one) and how many combinations there are for the basketball final game (again, of which Florida-Ohio State is one).

Once we've determined these two quantities, then the second major probability concept comes into play, namely the "multiplication/and rule." Quoting from King and Minium's introductory stats book (p. 199), which I use in my teaching:

[T]he probability of several particular events occurring successively or jointly is the product of their separate probabilities (provided that the generating events are independent).

To summarize to this point, we need to estimate the likelihood of each part of the question (i.e., Florida-Ohio State is one of X possible match-ups in football and one of X possible match-ups in basketball) and then multiply the two probabilities together.

The probability of a Florida-Ohio State match-up in football and in basketball almost certainly would not be the same. There are over 300 schools that compete in men's NCAA Division I basketball, whereas the comparable figure for football (known as Division I-A or Football Bowl Subdivision) is somewhat over 100. The difference is that some conferences of schools with relatively small athletic programs compete with the "big boys" of college basketball in the same championship tournament, but not in the upper echelon of college football.

I certainly don't think we should take "300 choose 2" as the number of possible match-ups for the basketball final, as only a fraction of the 300 schools realistically have a chance to make it to the championship game. "Cinderella" teams sometimes upset a powerhouse in the first round, adding to the drama and mystique of "March Madness," but they don't tend to make the final (in 2006, the underdog George Mason University made the Final Four, but not the title game).

I have adopted an arbitrary, yet seemingly reasonable, cut-off for how many Division I men's basketball teams should be in the pool (n) of teams that could possibly make the championship game. From the 2001-2006 NCAA tournaments inclusive, by my standard, a school would have to have won at least one game (i.e., advance to the round of 32) in two or more of the six years.

The number of schools meeting these criteria can be gleaned from another of my websites. I count 49 teams that qualify; let's say 50 to make it a round number. Under my system, Bucknell qualifies as a championship game contender, even though it is unlikely ever to advance to the final (see John Feinstein's book, The Last Amateurs, about Bucknell and its mates in the Patriot League, a review of which is available here). Still, I need to have objective criteria and, if an occasional surprise school gets in, so be it.

We then take "50 choose 2," which equals 1225, for the number of possible final-game match-ups among our 50 viable teams. The probability of a Florida-Ohio State men's basketball final, given equal likelihood among the 50 teams of making the final, is thus 1/1225.

My viability standard for football is that over the same six years (early January 2001 through early January 2006 inclusive), a team needed to play in at least one BCS bowl game (Rose, Orange, Sugar, or Fiesta; a fifth BCS game, known simply as the National Championship Game was added this past season).

By my count, there were 28 such teams; again, to make it a round number, let's say there are 30 teams in the pool. Taking "30 choose 2" gives us 435 possible match-ups for the football championship game among what I've defined as the title-viable teams. There would thus be a 1/435 probability of Florida and Ohio State meeting in the title game, assuming equal likelihood among the 30 teams.

There's a complication affecting the football calculation that does not affect the one for basketball. Specifically, given that even a single loss during the season will often put the kaibosh on a football team's chances of competing for the national title, it is highly unlikely that two teams from the same conference will meet in the national championship game (although this past season, an all-Big 10 match-up of Ohio State and Michigan came close to happening).

Basketball has no such impediment and championship games pitting teams from the same conference have occurred (namely, Indiana vs. Michigan in 1976, Villanova vs. Georgetown in 1985, and Kansas vs. Oklahoma in 1988).

The greatest conference representation within my set of viable football teams belonged to the Big 10 with six teams (Ohio State, Michigan, Penn State, Purdue, Iowa, and Illinois). As we know from the hockey example above, there are 15 possible two-way match-ups with six teams. The Big 12 and Pac 10 each had five teams (each yielding 10 possible intra-conference match-ups), whereas the Atlantic Coast Conference and Southeastern Conference each had four teams (each yielding six possible intra-conference match-ups). All other conferences had two or fewer teams.

Overall, there would be around 50 possible intra-conference match-ups needing to be excluded. As a result, we can adjust the estimated probability of a Florida-Ohio State football championship match-up to 1/385.

Multiplying the probability of a Florida-Ohio State basketball championship match-up (1/1225) times the original estimated probability of these same two teams playing in the football final (1/435) yields roughly 1 in 530,000. Multiplying (1/1225) X (1/385) yields roughly 1 in 470,000.

Either way, there was about a 1 in 500,000 probability of seeing Florida and Ohio State playing in both the football and men's basketball championship games in the same year.

Are we done yet? Not quite.

As I noted above, the probability just calculated was for Florida and Ohio State, per se, to meet in both title games. No offense to Gator and Buckeye fans, but the noteworthy aspect of the football and basketball championships was that they featured the same two teams, not necessarily Florida and Ohio State. If UCLA and Michigan had met in both the football and basketball finals, or Arizona and Oklahoma, or any other particular pair, the underlying phenomenon would have been the same.

This situation is analogous to the distinction between a particular named individual winning the lottery twice and the possibility of someone, somewhere winning it twice, the latter being much more likely than the former (I discussed this in an earlier posting).

When I compared my viable-contenders list for football and men's basketball, I found that 11 schools were on both lists. This would create "11 choose 2" -- which equals 55 -- possible match-ups that could have occurred in both the football and basketball championship games. Perhaps we could have had Texas-West Virginia match-ups in the two title games, or Pittsburgh-Notre Dame, or any of 52 others, in addition to Florida-Ohio State.

We now need to bring in our third concept of probability, the "addition/or rule." Again, from King and Minium (p. 199):

[T]he probability of occurrence of any one of several particular events is the sum of their individual probabilities (provided that they are mutually exclusive).

Here's an analogy: If we roll two dice, a red one and a green one, the probability of rolling double-sixes is 1/36 (via the aformentioned multiplication/and rule). But, if we want to know the probability of rolling any pair of matching numbers (i.e., 1-1, 2-2, 3-3, 4-4, 5-5, or 6-6), then we have to add up the six individual probabilities of 1/36 to arrive at 6/36 or 1/6.

Given that we said earlier that the probability was roughly 1 in 500,000 for any particular pair of schools (in this case Florida and Ohio State) to be in both finals, and that there were roughly 50 pairs of schools who could conceivably play in both finals in the same year, we arrive at 50/500,000 or 1/10,000 for the probability that the same two schools could meet in the football and men's basketball finals in the same year.

I've made a lot of assumptions along the way and have perhaps stumbled somewhere. If you have any comments, corrections, clarifications, etc., please let me (and the sporting world) know by clicking on the "Comments" heading below and leaving a message. To prevent spam, I've imposed some "hoops" to get through, but nothing too prohibitive, I hope. You do not need to establish a Blogger account to comment; you can either type in a name for yourself or post as "Anonymous."


Anonymous said...

All very interesting. Let me preface my remarks by stating I am not a statistician, but that--on its face--I believe the assumptions result in underestimate of the "true" probability (which, of course, we never really know). In particular, I would question the robustness of the assumption that all events are (approximately) equally probable. That is, the probability of each possible pair of teams meeting in the championships of two sports is clearly not the same, even among the relatively "elite" subset of schools that the analysis centers on. At least on the surface, it would seem that some schools historically do well in one sport, some in another--and some in both. Florida State and Ohio State both have well-funded sports programs and historically are more likely to have highly ranked teams in both sports than randomly selected pairs, even among the elite subset.

I believe assuming equal probabilities among all possible pairings will result in an underestimate of the true probability that, in any given year, two schools will meet in the championship game in two sports. I believe that relaxing this assumption would result in a greater probability that some pair of teams would meet in two independent sports' championships.

I believe assumption of equal probabilities, in effect, provides a "lower bound" estimate of the "true" probability (ceterus parabus). I would not, however, even consider what the magnitude of the underestimate is likely to be. I'll leave that to the statisticians.

An interesting problem....

David said...

Nice work! You might consider the probability that two schools would meet in two major sports (not just basketball and football). An excellent example of defining the domain of winning events can be found at this site.