Analysis of CFC ratings data 1995-2010


Home Ranking Ranking Summary Rating Distribution Rating Dist. Summary Tournaments Player Retention Tournament Winners

Database Errors Result Statistics Outstanding Performances Summary Extremes





Game and Result Statistics:

The data for the graps below was a sample of 111,499 games from 2005-2010 played in Swiss Tournaments (i.e. Round Robins and matches excluded).

1) Probability of playing someone with a rating x points higher / lower than you. As in graph 1 below.

Why be interested in this? Well, apart from being intrinsically interesting, if you are interested in what range the function of (expected result)in the rating formula should be valid for, or if you are interested in how long it might take inflation in a rating subgroup to affect other ratings this is something you would want to know.

2) Expected Score vs Actual Score.

Shown in graphs 2 & 3 below are curves for expected score, actual score (all players), actual score by rating vs rating point difference of the opponent. The good news is that it looks like these curves are not a function of the rating level, just of the rating difference. So, wether you are 600, 1200, 1800 or whatever, playing someone 200 points above you has the same expected result. The bad news is that the actual expected score and the expected score used by the rating formula do not agree. The higher rated player is more likely to lose points than is predicted by the rating formula. So there is a problem here in that the expected rating formula is wrong. Fixing this would require some fundamental rethinking of the rating formula. Incidentally, this kind of effect has been noticed before in the FIDE rating system by Jeff Sonas. You can refer to his article on ChessBase for details.

The effect of this formula differential is to compress the rating curve at high and low ratings. So, if you are the highest rated player in your area and only play lower rated players, you probably have some justification for claiming to being underated compared to the boys in the big city.

Note the wiggles in the curve for very high rating point differentials. There are not too many games out here so statistical significance may be questionable but mostly they involve juniors with ratings like 300-600 beating 1200-1500 rated plaeyers.

3) % Draws as a function of rating. Graph 4 shows the fraction of games drawn between equally rated opponents. The spike for high rated players is probably biased by the fact that these players tend to play in the last round of the event. Interesting is the minimum rate of draws at around 700 rating. Weighted results are shown in Graph 5 - i.e. what the average player of a given rating achieves in terms of wins/draws/loss. If the loss curve is above the win curve, players of that rating tend to play on average, stronger opposition (and vice versa).

4) Win, Loss, Draw percentage as a function of rating difference. The expected score gives an average result - but how does it break down into wins / losses / draws. Graph 6 answers that question. The data plotted does assume that the rating level is not an important variable (as suggested by graph 3).

5) Standard deviation of results Supposing you were interested in evaluating wether a player's performance in a tournament was statistically stronger or weaker than his rating. Part of the process would require some knowledge of the standard deviation of results. A simple initial try at that might use the binomial result (i.e. if there were no draws, only wins or losses) in which case sigma = sqrt(e(1-e)) where e is the expected score. There are two problems with this: 1) chess has a 3rd result (a draw) which tends to reduce the standard deviation and 2) the expected result as given by the rating formula does not match the actual results (Graph 2). Graphs 7 and 8 show the ratio of the true standard deviation to a binomial calculation using the rating formula as expected score (labeled 'formula binomial') and using the actual expected score (labeled simply 'binomial'). It is clear that the biggest issue is that the actual score is different from the expected score used in the rating formula. Using the binomial result is likely acceptable provided that the true expected score is used for calculation. [some of the extreme points have been omitted for clarity]