Mirror, Mirror on the Wall - How good is CricketX, after all?

Time for introspection

Arindom Mookerjee

26-Feb-2001

(An assessment of how the CricketX model has fared in picking winners_..)

Time for introspection. There are several features unique to CricketX. We are the only web-site to post forecasts of cricket matches, both static (pre-match) and dynamic (over-by-over). But are our forecasts reliable? Here are the bare facts. You, the thinking fan, sit in judgement.

The model

The propreitary model of CricketX was first developed by Dr. Surjit S.Bhalla in his book, "Between the wickets: The who and why of the best in cricket" (1987). This basic model has since undergone many foliations that has seen it transformed from just an authoritative rating system to a more holistic model of forecasting. The core of the CricketX model is based on crimetrics (from: cricket econometrics). Crimetrics allows the use of sophisticated statistical techniques to adjust player performances according to the nature of the pitch and the quality of the opposition. For teams, it leads to the construction of batting and bowling indices that are vital inputs in the forecasting model.

The data

The CricketX database covers every single international match ever played - now 1508 Tests and 1624 ODIs. However over-by-over scores is a comparatively new development. The run rate comparison statistic has featured more prominently since the World Cup in England in 1999. Our database tracks this for all matches beginning World Cup'99 - about 150 matches. The analysis of our predictions, would be based on this sample and will not take into account abandoned matches. We do not rate or predict matches involving non Test-playing nations like Kenya, Holland or Scotland.

The results

There are three sets of results. One, for our static forecasts, the second for predicting the score at the end of the 50 overs in the first innings and third, for the dynamic winning probabilities in the second innings. Besides, there are forecasts on individual performances.

With the static forecasts, we got a creditable 69% right. This means that out of 100 matches, we have correctly predicted the outcome of 69 of them before a ball is bowled. For the World Cup, this figure was around 73%. Initially, we display two sets of probabilities on our home page, depending on which side bats first. The record also shows that the errors have been uniformly right or wrong for both the team batting first and batting second. This shows that the model is fairly robust.

Predicting the end-score of the first innings

 Mean error Error range
After (runs) (25th - 75th %ile)
10 overs 10 -22 to 36
15 overs 14 -18 to 31
25 overs 13 -13 to 31
35 overs 8 -11 to 22
45 overs 0.7 -9 to 11

For the first innings Dynamic Score Predictor (DSP), the standard error (SE) at the end of 10 overs is 10 runs. Till about the 25th over, the error stays around 13 runs. By the 35th over the difference between the predicted score and actual score at the end of 50 overs falls to 8 runs on average. Five overs from close, the SE is less than 0.5 runs. Not only the SE, the range of the errors too has progressively fallen. At the end of 15 overs, when the SE was 13 runs, the error in predicting is in the range of -18 to 31 runs. By the 35th over, the range narrows down to -11 to 22 runs.

Predicting the winner in the second innings

 Winner correctly identified
Overs Left (% of matches)
50          78
40          81
35          82
25          84
15          89
5          88

Once the target is known and before the second innings gets underway, we have called it right 78% of the times. At the half-way mark, our record improves to 84%. This means that half-way through the second innings, we were able to correctly say whether the chasing team would win (51% probability of winning) or not 84% of the times. This percentage steadily increases to 89 by the 35th over.

Besides match forecasting, we also make predictions for individual players. Our predictions for the last Australia-South Africa series is still featured on our site. This followed our hugely successful maiden venture for the Asia Cup. You can read about it in "Lest we forget_" How did these predictions stack up?

The forecasts for the Asia Cup were spot on with most leading batsmen averaging within two runs of the forecasts.

For the last Australia-South Africa series, Michael Bevan was predicted to score 130 runs, he scored 142. For Mark Waugh, the forecasts said that his form would not turn very significantly. From 25 runs in the last 5 innings prior to this series, he scored 66 in 3. CricketX had him at 90.

Lance Klusener was correctly identified as the highest scorer from the Proteas rank. He was predicted to score 120 runs in 3 innings at an average of 40. His average for the series was 43, although he scored 86 runs.

Among the bowlers, McGrath was predicted to be the leading wicket taker with 8. He bagged 4. Other players, like Harvey, Gillespie and Shane Lee, who the site had not forecast for, picked up the bulk of wickets.

Among South African bowlers, Nicky Boje got 4 of 4; Klusener 3 of 5 and Pollock 3 of 6 that were forecast.

Forecasting requires a sound crimetric (cri for cricket, metric for econometric) model. It's also a risky business. An inventory of expertise expertise and risk appetite probably explains why CricketX are the only ones sticking their necks out!

India Australia tour of India