Preseason rankings construction


How do you construct a preseason ranking?

Last year I investigated the question of preseason rankings. Given that the computer knows nothing about the 2011 edition of all teams, how can an intelligent preseason rankings be constructed? Some initial notes are given here http://bcmoorerankings.com/2010/07/tradition-one-year-tradition/.

Possibilities:

1. Use last year’s team strengths
2. Use a different year’s team strengths
3. Use a combination of previous year’s team strengths
4. Use a combination of previous year’s team strengths, with a correction for returning starters

I really like #4, but there is no complete database of returning starters.

=======================

After more work since last year’s post, here is a more complete set of data for testing possibility 1:

data, pred, corr/totl = ratio, notes
2000, 1999, 1188/1695 = 0.701, 7 new teams set to 0.00
2001, 2000, 1220/1699 = 0.718,
2002, 2001, 1182/1702 = 0.694, 2 new teams set to 0.00
2003, 2002, 1228/1716 = 0.716, 2 new teams set to 0.00
2004, 2003, 1192/1728 = 0.690, 10 new teams set to 0.00
2005, 2004, 1237/1729 = 0.715, 3 new teams set to 0.00
2006, 2005, 1246/1735 = 0.718, 2 new teams set to 0.00 - 1 renamed team set to 0.00
2007, 2006, 1226/1732 = 0.708, 3 new teams set to 0.00
2008, 2007, 1195/1800 = 0.664, 12 new teams set to 0.00
2009, 2008, 1292/1800 = 0.718, 2 new teams set to 0.00
Total      12206/17336 = 0.704

Details for line 1:

1999 strengths: these are the unbiased estimates of teams strengths obtained from the 1999 analysis
2000 data: the data set of all games played in 2000
correct: number of games picked correctly by using 1999 strengths
total: total number of games played in 2000
ratio: correct/total

Note:

If a new team is formed, its rating is set to zero. If a strong team is formed and gets set to 0.00, the prediction ratio will definitely go down. One example of this is IKM-Manning, formed in fall 2008. Both teams were previously quite strong, with the two teams finishing 1-2 after the 2006 season. In this example IKM-Manning should/could be rated as the average of the two schools previous ratings, undoubtedly improving the predictions for this new school.

So in general, a person can predict about 70% of games correctly by using last year’s analysis.

This is quite a large number in my opinion. Flipping a coin gives you 50% correct. Using last year’s data gets you to 70%. Doing a full regression analysis get you to 80%. Tradition, in the sense of 1 year tradition, adds value, quite a bit of value, in fact.

=======================

Possibility 2 was investigated very thoroughtly last year:

http://bcmoorerankings.com/2010/07/predicting-a-decade-in-the-future/

game, pred, corr/totl, ratio, notes
2009, 2008, 1292/1800, 0.718, 2 teams set to 0.00
2009, 2007, 1157/1800, 0.643, 14 teams set to 0.00
2009, 2006, 1125/1800, 0.625, 17 teams set to 0.00
2009, 2005, 1094/1800, 0.608, 20 teams set to 0.00
2009, 2004, 1075/1800, 0.597, 23 teams set to 0.00
2009, 2003, 1023/1800, 0.568, 32 teams set to 0.00
2009, 2002, 1044/1800, 0.580, 34 teams set to 0.00
2009, 2001, 1047/1800, 0.582, 33 teams set to 0.00
2009, 2000, 1075/1800, 0.597, 33 teams set to 0.00
2009, 1999,  992/1800, 0.551, 38 teams set to 0.00

Explanation of last entry:
Use 1999 analysis to predict 2009 games
992 games picked correctly out of 1800, 55.1% correct
38 teams existed in 2009 that I don’t have strength data in 1999.

Thus, in general, predictions get worse as older data is used.

=======================

Possibilities #3 and #4 remain elusive. There is only so much time …