So with schedules to be released in 2 days, the inevitable question is “who has the toughest schedule?” To answer this question, a estimate of all team strengths is required. However, no games are been played, so what should be done?
1. Use last year’s team strengths
2. Use a different year’s team strengths
3. Use a combination of previous year’s team strengths
4. Use a combination of previous year’s team strengths, with a correction for returning starters
I really like #4, but there is no complete database of returning starters. For now I would like to announce some results regarding choice #1.
year strengths, year data, correct / total = ratio, notes 1999 strengths, 2000 data, 1188 / 1695 = 0.701, 7 teams set to 0.00 2000 strengths, 2001 data, 1220 / 1699 = 0.718 2001 strengths, 2002 data, 1182 / 1702 = 0.694, 2 new teams set to 0.00 2002 strengths, 2003 data, 1228 / 1716 = 0.716, 2 new teams set to 0.00
Details for line 1:
1999 strengths: these are the unbiased estimates of teams strengths obtained from the 1999 analysis
2000 data: the data set of all games played in 2000
correct: number of games picked correctly by using 1999 strengths
total: total number of games played in 2000
notes: in 2000 there were 7 new teams that did not exist in 1999 data set
So in general, a person can predict about 70% of games correctly by using last year’s analysis.
This is quite a large number in my opinion. Flipping a coin gives you 50% correct. Using last year’s data gets you to 70%. Doing a full regression analysis get you to 80%. Tradition, in the sense of 1 year tradition, adds value, quite a bit of value, in fact.
I will continue to update weekly reports for 2004-2009. I will then do some research on options #2 and #3. And if I have time, I will dream longingly about #4.
2 days until fall schedules are released, July 15, 2010
27 days until the first day of practice, August 9, 2010