Contents

Results (July 2019)

TADPOLE's first phase is complete and we have evaluated all the prize-eligible submissions.

Watch the live announcement on YouTube: https://www.youtube.com/watch?v=BFS9Sr0lhuM

Check below for a description of the evaluation dataset and the overall rankings.

General observations

  • There was no clear "one-size-fits-all" winner.

  • Data-driven approaches for both feature selection and prediction of target variables generally performed well.

  • Many teams combined different types of algorithms to produce forecasts:

    1. Most used statistical regression;
    2. Some used generic machine learning techniques that are robust and can work well for other problems; and
    3. Some used disease progression models that are specifically tailored for the current problem of disease prediction.
  • Forecasts were very good for clinical diagnosis and ventricle volume -- on the other hand, predicting ADAS turned out to be very difficult -- no team was able to generate forecasts that were significantly better than random guessing 

  • Meta-analysis results: the most-important features that helped improve predictions were DTI & CSF for clinical diagnosis, and "augmented features" for Ventricle volume prediction.

  • Throughout this page we will refer to ADAS-Cog 13 as simply ADAS.

TADPOLE Prize winners

Category Team Members Institution Country Prize
Overall best Frog Keli Liu, Paul Manser, Christina Rabe Genentech USA £5000
Clinical status Frog Keli Liu, Paul Manser, Christina Rabe Genentech USA £5000
Ventricle volume EMC1 Vikram Venkatraghavan, Esther Bron, Stefan Klein Erasmus MC Netherlands £5000
Best university team Apocalypse Manon Ansart ICM, INRIA France £5000
High-School (best) Chen-MCW Gang Chen Medical College Wisconsin USA £5000
High-School (runner up) CyberBrains Ionut Buciuman, Alex Kelner, Raluca Pop, Denisa Rimocea, Kruk Zsolt Vasile Lucaciu College Romania £2500
Overall best D3 prediction GlassFrog Steven Hill, Brian Tom, Anais Rouanst, Zhiyue Huang, James Howlett, Steven Kiddle, Simon R. White, Sach Mukherjee, Bernd Taschler Cambridge University UK £2500

Overall Results

Legend:

  • MAUC – Multiclass Area Under the Curve
  • BCA – Balanced Classification Accuracy
  • MAE – Mean Absolute Error
  • WES – Weighted Error Score
  • CPA – Coverage Probability Accuracy for 50% Confidence Interval
  • ADAS – Alzheimer's Disease Assessment Scale Cognitive (13)
  • VENTS – Ventricle Volume
  • RANK (overall) – We first compute the sum of ranks from MAUC, ADAS MAE and VENTS MAE, then derive the final ranking from these sums of ranks. For example, the top entry has the lowest sum of ranks from these three categories.

Note (14 June 2019): The rankings in each prize category can be found by ordering according to Diagnosis MAUC, and ADAS and Ventricle MAE. The overall rankings below require valid submissions for every target variable. 

Overall scores — longitudinal dataset D2

RANK
FILE NAME MAUC RANK MAUC BCA ADAS RANK ADAS MAE ADAS WES ADAS CPA VENTS RANK VENTS MAE VENTS WES VENTS CPA
1.0 Frog 1.0 0.931 0.849 4.0 4.85 4.74 0.44 10.0 0.45 0.33 0.47
2.0 EMC1-Std 8.0 0.898 0.811 23.5 6.05 5.40 0.45 1.5 0.41 0.29 0.43
3.0 VikingAI-Sigmoid 16.0 0.875 0.760 7.0 5.20 5.11 0.02 11.5 0.45 0.35 0.20
4.0 EMC1-Custom 11.0 0.892 0.798 23.5 6.05 5.40 0.45 1.5 0.41 0.29 0.43
5.0 CBIL 9.0 0.897 0.803 15.0 5.66 5.65 0.37 13.0 0.46 0.46 0.09
6.0 Apocalypse 7.0 0.902 0.827 14.0 5.57 5.57 0.50 20.0 0.52 0.52 0.50
7.0 GlassFrog-Average 5.0 0.902 0.825 8.0 5.26 5.27 0.26 29.0 0.68 0.60 0.33
8.0 GlassFrog-SM 5.0 0.902 0.825 17.0 5.77 5.92 0.20 21.0 0.52 0.33 0.20
9.0 BORREGOTECMTY 19.0 0.866 0.808 20.0 5.90 5.82 0.39 5.0 0.43 0.37 0.40
10.0 EMC-EB 3.0 0.907 0.805 39.0 6.75 6.66 0.50 9.0 0.45 0.40 0.48
11.5 lmaUCL-Covariates 22.0 0.852 0.760 27.0 6.28 6.29 0.28 3.0 0.42 0.41 0.11
11.5 CN2L-Average 27.0 0.843 0.792 9.0 5.31 5.31 0.35 16.0 0.49 0.49 0.33
13.0 VikingAI-Logistic 20.0 0.865 0.754 21.0 6.02 5.91 0.26 11.5 0.45 0.35 0.20
14.0 lmaUCL-Std 21.0 0.859 0.781 28.0 6.30 6.33 0.26 4.0 0.42 0.41 0.09
15.5 CN2L-RandomForest 10.0 0.896 0.792 16.0 5.73 5.73 0.42 31.0 0.71 0.71 0.41
15.5 FortuneTellerFish-SuStaIn 40.0 0.806 0.685 3.0 4.81 4.81 0.21 14.0 0.49 0.49 0.18
17.0 CN2L-NeuralNetwork 41.0 0.783 0.717 10.0 5.36 5.36 0.34 7.0 0.44 0.44 0.27
18.0 BenchmarkMixedEffectsAPOE 35.0 0.822 0.749 2.0 4.75 4.75 0.36 23.0 0.57 0.57 0.40
19.0 Tohka-Ciszek-RandomForestLin 17.0 0.875 0.796 22.0 6.03 6.03 0.15 22.0 0.56 0.56 0.37
20.0 BGU-LSTM 12.0 0.883 0.779 25.0 6.09 6.12 0.39 25.0 0.60 0.60 0.23
21.0 DIKU-GeneralisedLog-Custom 13.0 0.878 0.790 11.5 5.40 5.40 0.26 38.5 1.05 1.05 0.05
22.0 DIKU-GeneralisedLog-Std 14.0 0.877 0.790 11.5 5.40 5.40 0.26 38.5 1.05 1.05 0.05
23.0 CyberBrains 34.0 0.823 0.747 6.0 5.16 5.16 0.24 26.0 0.62 0.62 0.12
24.0 AlgosForGood 24.0 0.847 0.810 13.0 5.46 5.11 0.13 30.0 0.69 3.31 0.19
25.0 lmaUCL-halfD1 26.0 0.845 0.753 38.0 6.53 6.51 0.31 6.0 0.44 0.42 0.13
26.0 BGU-RF 28.0 0.838 0.673 29.5 6.33 6.10 0.35 17.5 0.50 0.38 0.26
27.0 Mayo-BAI-ASU 52.0 0.691 0.624 5.0 4.98 4.98 0.32 19.0 0.52 0.52 0.40
28.0 BGU-RFFIX 32.0 0.831 0.673 29.5 6.33 6.10 0.35 17.5 0.50 0.38 0.26
29.0 FortuneTellerFish-Control 31.0 0.834 0.692 1.0 4.70 4.70 0.22 50.0 1.38 1.38 0.50
30.0 GlassFrog-LCMEM-HDR 5.0 0.902 0.825 31.0 6.34 6.21 0.47 51.0 1.66 1.59 0.41
31.0 SBIA 43.0 0.776 0.721 43.0 7.10 7.38 0.40 8.0 0.44 0.31 0.13
32.0 Chen-MCW-Stratify 23.0 0.848 0.783 36.5 6.48 6.24 0.23 36.5 1.01 1.00 0.11
33.0 Rocket 54.0 0.680 0.519 18.0 5.81 5.71 0.34 28.0 0.64 0.64 0.29
34.5 Chen-MCW-Std 29.0 0.836 0.778 36.5 6.48 6.24 0.23 36.5 1.01 1.00 0.11
34.5 BenchmarkSVM 30.0 0.836 0.764 40.0 6.82 6.82 0.42 32.0 0.86 0.84 0.50
36.0 DIKU-ModifiedMri-Custom 36.5 0.807 0.670 33.5 6.44 6.44 0.27 34.5 0.92 0.92 0.01
37.0 DIKU-ModifiedMri-Std 38.5 0.806 0.670 33.5 6.44 6.44 0.27 34.5 0.92 0.92 0.01
38.0 DIVE 51.0 0.708 0.568 42.0 7.10 7.10 0.34 15.0 0.49 0.49 0.13
39.0 ITESMCEM 53.0 0.680 0.657 26.0 6.26 6.26 0.35 33.0 0.92 0.92 0.43
40.0 BenchmarkLastVisit 44.5 0.774 0.792 41.0 7.05 7.05 0.45 27.0 0.63 0.61 0.47
41.0 Sunshine-Conservative 25.0 0.845 0.816 44.5 7.90 7.90 0.50 43.5 1.12 1.12 0.50
42.0 BravoLab 46.0 0.771 0.682 47.0 8.22 8.22 0.49 24.0 0.58 0.58 0.41
43.0 DIKU-ModifiedLog-Custom 36.5 0.807 0.670 33.5 6.44 6.44 0.27 47.5 1.17 1.17 0.06
44.0 DIKU-ModifiedLog-Std 38.5 0.806 0.670 33.5 6.44 6.44 0.27 47.5 1.17 1.17 0.06
45.0 Sunshine-Std 33.0 0.825 0.771 44.5 7.90 7.90 0.50 43.5 1.12 1.12 0.50
46.0 Billabong-UniAV45 49.0 0.720 0.616 48.5 9.22 8.82 0.29 41.5 1.09 0.99 0.45
47.0 Billabong-Uni 50.0 0.718 0.622 48.5 9.22 8.82 0.29 41.5 1.09 0.99 0.45
48.0 ATRI-Biostat-JMM 42.0 0.779 0.710 51.0 12.88 69.62 0.35 54.0 1.95 5.12 0.33
49.0 Billabong-Multi 56.0 0.541 0.556 55.0 27.01 19.90 0.46 40.0 1.07 1.07 0.45
50.0 ATRI-Biostat-MA 47.0 0.741 0.671 52.0 12.88 11.32 0.19 53.0 1.84 5.27 0.23
51.0 BIGS2 58.0 0.455 0.488 50.0 11.62 14.65 0.50 49.0 1.20 1.12 0.07
52.0 Billabong-MultiAV45 57.0 0.527 0.530 56.0 28.45 21.22 0.47 45.0 1.13 1.07 0.47
53.0 ATRI-Biostat-LTJMM 55.0 0.636 0.563 54.0 16.07 74.65 0.33 52.0 1.80 5.01 0.26
- Threedays 2.0 0.921 0.823 - - - - - - - -
- ARAMIS-Pascal 15.0 0.876 0.850 - - - - - - - -
- IBM-OZ-Res 18.0 0.868 0.766 - - - - 46.0 1.15 1.15 0.50
- Orange 44.5 0.774 0.792 - - - - - - - -
- SMALLHEADS-NeuralNet 48.0 0.737 0.605 53.0 13.87 13.87 0.41 - - - -
- SMALLHEADS-LinMixedEffects - - - 46.0 8.09 7.94 0.04 - - - -
- Tohka-Ciszek-SMNSR - - - 19.0 5.87 5.87 0.14 - - - -

The results on the D2 dataset suggest that we do not have a clear winner on all categories. While Frog had the best overall submission with the lowest sum of ranks, for each performance metric individually we had different winners: Frog (clinical diagnosis MAUC of 0.931), ARAMIS-Pascal (clinical diagnosis BCA of 0.850), FortuneTellerFish-Control (ADAS MAE and WES of 4.7), VikingAI-Sigmoid (ADAS CPA of 0.02), EMC1-Std/EMC1-Custom (ventricle MAE of 0.41 and ventricle WES or 0.29), and DIKU-ModifiedMri-Std/ DIKU-ModifiedMri-Custom (ventricle CPA of 0.01).

Overall scores — cross-sectional dataset D3

RANK
FILE NAME MAUC RANK MAUC BCA ADAS RANK ADAS MAE ADAS WES ADAS CPA VENTS RANK VENTS MAE VENTS WES VENTS CPA
1.0 GlassFrog-Average 3.0 0.897 0.826 5.0 5.86 5.57 0.25 3.0 0.68 0.55 0.24
2.0 GlassFrog-LCMEM-HDR 3.0 0.897 0.826 9.0 6.57 6.56 0.34 1.0 0.48 0.38 0.24
3.0 GlassFrog-SM 3.0 0.897 0.826 4.0 5.77 5.77 0.19 9.0 0.82 0.55 0.07
4.0 Tohka-Ciszek-RandomForestLin 11.0 0.865 0.786 2.0 4.92 4.92 0.10 10.0 0.83 0.83 0.35
7.0 VikingAI-Logistic 8.0 0.876 0.768 6.0 5.94 5.91 0.22 22.0 1.04 1.01 0.18
7.0 Rocket 10.0 0.865 0.771 3.0 5.27 5.14 0.39 23.0 1.06 1.06 0.27
7.0 lmaUCL-Std 13.0 0.854 0.698 17.0 6.95 6.93 0.05 6.0 0.81 0.81 0.22
7.0 lmaUCL-Covariates 13.0 0.854 0.698 17.0 6.95 6.93 0.05 6.0 0.81 0.81 0.22
7.0 lmaUCL-halfD1 13.0 0.854 0.698 17.0 6.95 6.93 0.05 6.0 0.81 0.81 0.22
10.0 EMC1-Std 30.0 0.705 0.567 7.0 6.29 6.19 0.47 4.0 0.80 0.62 0.48
11.0 SBIA 28.0 0.779 0.782 10.0 6.63 6.43 0.40 8.0 0.82 0.75 0.18
13.0 BGU-LSTM 6.0 0.877 0.776 14.0 6.75 6.17 0.39 27.0 1.11 0.79 0.17
13.0 BGU-RFFIX 6.0 0.877 0.776 14.0 6.75 6.17 0.39 27.0 1.11 0.79 0.17
13.0 BGU-RF 6.0 0.877 0.776 14.0 6.75 6.17 0.39 27.0 1.11 0.79 0.17
15.0 BravoLab 18.0 0.813 0.730 28.0 8.02 8.02 0.47 2.0 0.64 0.64 0.42
16.5 BORREGOTECMTY 15.0 0.852 0.748 8.0 6.44 5.86 0.46 30.0 1.14 1.02 0.49
16.5 CyberBrains 17.0 0.830 0.755 1.0 4.72 4.72 0.21 35.0 1.54 1.54 0.50
18.0 ATRI-Biostat-MA 19.0 0.799 0.772 26.0 7.39 6.63 0.04 11.0 0.93 0.97 0.10
19.5 EMC-EB 9.0 0.869 0.765 27.0 7.71 7.91 0.50 21.0 1.03 1.07 0.49
19.5 DIKU-GeneralisedLog-Std 20.0 0.798 0.684 20.5 6.99 6.99 0.17 16.5 0.95 0.95 0.05
21.0 DIKU-GeneralisedLog-Custom 21.0 0.798 0.681 20.5 6.99 6.99 0.17 16.5 0.95 0.95 0.05
22.5 DIKU-ModifiedLog-Std 22.5 0.798 0.688 23.5 7.10 7.10 0.17 13.5 0.95 0.95 0.05
22.5 DIKU-ModifiedMri-Std 22.5 0.798 0.688 23.5 7.10 7.10 0.17 13.5 0.95 0.95 0.05
24.5 DIKU-ModifiedLog-Custom 24.5 0.798 0.691 23.5 7.10 7.10 0.17 13.5 0.95 0.95 0.05
24.5 DIKU-ModifiedMri-Custom 24.5 0.798 0.691 23.5 7.10 7.10 0.17 13.5 0.95 0.95 0.05
26.0 Billabong-Uni 31.0 0.704 0.626 11.5 6.69 6.69 0.38 19.5 0.98 0.98 0.48
27.0 Billabong-UniAV45 32.0 0.703 0.620 11.5 6.69 6.69 0.38 19.5 0.98 0.98 0.48
28.0 ATRI-Biostat-JMM 26.0 0.794 0.781 29.0 8.45 8.12 0.34 18.0 0.97 1.45 0.37
29.0 CBIL 16.0 0.847 0.780 33.0 10.99 11.65 0.49 29.0 1.12 1.12 0.39
30.0 BenchmarkLastVisit 27.0 0.785 0.771 19.0 6.97 7.07 0.42 33.0 1.17 0.64 0.11
31.0 Billabong-MultiAV45 33.0 0.682 0.603 30.5 9.30 9.30 0.43 24.5 1.09 1.09 0.49
32.0 Billabong-Multi 34.0 0.681 0.605 30.5 9.30 9.30 0.43 24.5 1.09 1.09 0.49
33.0 ATRI-Biostat-LTJMM 29.0 0.732 0.675 34.0 12.74 63.98 0.37 32.0 1.17 1.07 0.40
34.0 BenchmarkSVM 36.0 0.494 0.490 32.0 10.01 10.01 0.42 31.0 1.15 1.18 0.50
35.0 DIVE 35.0 0.512 0.498 35.0 16.66 16.74 0.41 34.0 1.42 1.42 0.34
- IBM-OZ-Res 1.0 0.905 0.830 - - - - 36.0 1.77 1.77 0.50

Here, most submissions have worse performance compared to the equivalent predictions on the D2 longitudinal dataset, due to the lack of longitudinal, multimodal data. GlassFrog-Average had the best overall rank and obtained a diagnosis MAUC of 0.897, ADAS MAE of 5.86 and a Ventricle MAE of 0.68 (% ICV). For diagnosis prediction, IBM-OZ-Res obtained the highest clinical diagnosis scores: MAUC of 0.905 and BCA of 0.830. For ADAS predictions, CyberBrains had the best MAE and WES of 4.72. ATRI-Biostat-MA obtained the best ADAS CPA of 0.04. For Ventricle prediction, GlassFrog-LCMEM-HDR had a MAE of 0.48 (% ICV) and the best WES of 0.38, while the 6 DIKU submissions obtained the best CPA of 0.05.

Additional entries

In addition to the standard predictions and the benchmarks, we also included two consensus predictions by taking the mean (ConsensusMean) and median (ConsensusMedian) over all predictions from all participants. For D2 predictions, the ConsensusMedian submission obtained the best overall rank, obtaining MAUC of 0.925 in diagnosis prediction (second-best), 5.12 error on ADAS-Cog 13 MAE (ninth-best) and 0.38 on Ventricles MAE, the best result in this category for D2. On the other hand, ConsensusMean ranked 3rd overall on D2, with diagnosis MAUC of 0.920 (fourth-best), ADAS-Cog 13 MAE of 3.75, the best prediction in this category, and Ventricle MAE of 0.48 (rank 16). For ADAS-Cog 13 and Ventricle volume prediction, the best consensus methods reduced the error by 11% and 8% respectively compared to the best prediction from participants or benchmarks.

In order to test whether the best results have not been obtained by chance due to randomness in the test set, we evaluated n=62 (as many as number of entries) randomly perturbed predictions from the simplest benchmark, BenchmarkLastVisit, and computed the best results obtained by any of these predictions. These are shown as RandomisedBest, and obtain high scores especially for ADAS-Cog 13, ranking 3rd with a final MAE of 4.52. High performance scores are also obtained for Ventricles, ranking 14 with an MAE of 0.47, a 14% increase in error from the best forecast, while for diagnosis prediction a lower MAUC score of 0.797 is obtained, ranking 43rd. This suggests that the entries with higher MAE than RandomisedBest should be interpreted with care, as the scores and ranks could be high due to randomness in the test set. This is particularly relevant for ADAS-Cog 13 predictions, where only the BenchmarkMixedEffects and ConsensusMean got better results, suggesting all other methods are not able to predict the ADAS-Cog 13 any better than random guessing based on the last available measurement.

It is worth mentioning that, while drafting the manuscript, we discovered that dropping APOE as a covariate in the BenchmarkMixedEffectsAPOE model considerably decreases the error in ADAS prediction, so we included it as an additional entry for scientific interest.  

Additional entries for D2
RANK
FILE NAME MAUC RANK MAUC BCA ADAS RANK ADAS MAE ADAS WES ADAS CPA VENTS RANK VENTS MAE VENTS WES VENTS CPA
1.5 ConsensusMedian 1.0 0.925 0.857 4.0 5.12 5.01 0.28 1.0 0.38 0.33 0.09
1.5 ConsensusMean 2.0 0.920 0.835 1.0 3.75 3.54 0.00 3.0 0.48 0.45 0.13
3.5 BenchmarkMixedEffects 3.0 0.846 0.706 2.0 4.19 4.19 0.31 4.0 0.56 0.56 0.50
3.5 RandomisedBest 4.0 0.797 0.803 3.0 4.52 4.52 0.27 2.0 0.47 0.45 0.33
Additional entries for D3
RANK
FILE NAME MAUC RANK MAUC BCA ADAS RANK ADAS MAE ADAS WES ADAS CPA VENTS RANK VENTS MAE VENTS WES VENTS CPA
1.0 ConsensusMean 1.0 0.917 0.821 2.0 4.58 4.34 0.12 2.0 0.73 0.72 0.09
2.0 ConsensusMedian 2.0 0.905 0.817 3.0 5.44 5.37 0.19 1.0 0.71 0.65 0.10
3.0 BenchmarkMixedEffects 3.0 0.839 0.728 1.0 4.23 4.23 0.34 3.0 1.13 1.13 0.50

Confidence Intervals

Below are confidence intervals (CIs) computed for every submission, based on 50 bootstraps of the test set D4. The first figure (Fig. 1) shows CIs based on forecasts from D2, while the second (Fig. 2) shows CIs for forecasts on D3.

Fig 1. Confidence intervals for forecasts based on the longitudinal D2 prediction set.

Fig 2. Confidence intervals for forecasts based on the cross-sectional D3 prediction set.

Meta-analysis

To understand which types of features and algorithms yielded higher performance, we show here associations between predictive performance and feature selection methods, different types of features, methods for data imputation, and methods for forecasting of target variables (diagnosis, ADAS and ventricles). For each type of feature/method and each target variable (clinical diagnosis, ADAS and Ventricles), we show the distribution of estimated coefficients from a general linear model, derived from the approximated inverse hessian matrix at the maximum likelihood estimator. From this analysis we removed outliers, defined as submissions with ADAS MAE higher than 10 and Ventricle MAE higher than 0.15 (%ICV). For all plots, distributions to the right of the gray dashed vertical line are associated with better performance. 

The results in Fig. 3 below show trends that indicate what aspects of the methods could be associated with better performance. For feature selection, methods that perform manual selection of features are associated with better predictive performance in ADAS13 and Ventricles. In terms of feature types, including features from many modalities was generally associated with an increase in overall performance, except for FDG (for all target variables). Moreover, augmented features correlate with overall performance improvements especially for ventricle prediction. In terms of data imputation methods, while some differences can be observed, no clear conclusions can be drawn currently. In terms of prediction models, we notice that neural networks are more significantly associated with increased performance in ventricle prediction, while disease progression models are associated with decreased performance in prediction or clinical diagnosis and ventricles. However, given the small number of methods tested (\<50) and the large number of degrees of freedom (n=21), these results should be interpreted with care.

Fig 3. Associations between the prediction of clinical diagnosis, ADAS and Ventricle volume and different strategies of (top) feature selection, (upper-middle) types of features, (lower-middle) data imputation strategies and (bottom) prediction methods for the target variables. For each type of feature/method (rows) and each target variable (columns), we show the distribution of estimated coefficients from a general linear model. Positive coefficients, where distributions lie to the right of the dashed vertical line, indicate better performance than baseline (vertical dashed line). For ADAS and Ventricle prediction, we flipped the sign of the coefficients, to consistently show better performance to the right of the vertical line.

Demographics of D1-D4 datasets

Summary of TADPOLE datasets D1-D4. Each subject has been allocated to either Control, MCI or AD group based on diagnosis at the first available visit within each dataset. The bottom table contains the number of visits with data available, by modality. For example, in D4 there were a total of 150 visits where an MRI scan was undertaken, which represented a total of 64% of all visits analysed across all subjects in D4. 


Measure D1 D2 D3 D4
Cognitively Normal
Subjects 1667 896 896 219
Number (%) 508 (30.5%) 369 (41.2%) 299 (33.4%) 94 (42.9%)
Visits per subject 8.3 (4.5) 8.5 (4.9) 1.0 (0.0) 1.0 (0.2)
Age 74.3 (5.8) 73.6 (5.7) 72.3 (6.2) 78.4 (7.0)
Gender (% male) 48.6% 47.2% 43.5% 47.9%
MMSE 29.1 (1.1) 29.0 (1.2) 28.9 (1.4) 29.1 (1.1)
Converters 18 (3.5%) 9 (2.4%)

Mild Cognitive Impairment
Number (%) 841 (50.4%) 458 (51.1%) 269 (30.0%) 90 (41.1%)
Visits per subject 8.2 (3.7) 9.1 (3.6) 1.0 (0.0) 1.1 (0.3)
Age 73.0 (7.5) 71.6 (7.2) 71.9 (7.1) 79.4 (7.0)
Gender (% male) 59.3% 56.3% 58.0% 64.4%
MMSE 27.6 (1.8) 28.0 (1.7) 27.6 (2.2) 28.1 (2.1)
Converters 117 (13.9%) 37 (8.1%)
9 (10.0%)
Alzheimer’s Disease
Number (%) 318 (19.1%) 69 (7.7%) 136 (15.2%) 29 (13.2%)
Visits per subject 4.9 (1.6) 5.2 (2.6) 1.0 (0.0) 1.1 (0.3)
Age 74.8 (7.7) 75.1 (8.4) 72.8 (7.1) 82.2 (7.6)
Gender (% male) 55.3% 68.1% 55.9% 51.7%
MMSE 23.3 (2.0) 23.1 (2.0) 20.5 (5.9) 19.4 (7.2)
Converters


9 (31.0%)





Number of visits with available data (as % of total visits)
Cognitive 8862 (69.9%) 5218 (68.1%) 753 (84.0%) 223 (95.3%)
MRI 7884 (62.2%) 4497 (58.7%) 224 (25.0%) 150 (64.1%)
FDG 2119 (16.7%) 1544 (20.2%) 0 (0.0%) 0 (0.0%)
AV45 2098 (16.6%) 1758 (23.0%) 0 (0.0%) 0 (0.0%)
AV1451 89 (0.7%) 89 (1.2%) 0 (0.0%) 0 (0.0%)
DTI 779 (6.1%) 636 (8.3%) 0 (0.0%) 0 (0.0%)
CSF 2347 (18.5%) 1458 (19.0%) 0 (0.0%) 0 (0.0%)

Description of Algorithms

Summary

We had a total of 33 participating teams, who submitted a total of 58 forecasts from D2, 34 forecasts from D3, and 6 forecasts from custom prediction sets. A total of 8 D2/D3 submissions from 6 teams did not have predictions for all three target variables, so we only computed the performance metrics for the available target variables. Another 3 submissions lacked confidence intervals for either ADAS or ventricle volume, which we imputed using default low-width confidence ranges of 2 for ADAS and 0.002 for Ventricles/ICV. 

Table 1 below summarizes the methods used in the submissions in terms of feature selection, handling of missing data, predictive models for clinical diagnosis and ADAS/Ventricles biomarkers, as well as training and prediction times. Condensed descriptions of each submitted method can be found here, while even more detailed descriptions are here (original files submitted by participants). 

Submission  Feature selection Number of features Missing data imputation Diagnosis prediction ADAS/Vent. Prediction Training time Prediction time (one subject)
AlgosForGood Manual 16+5* forward-filling Aalen model linear regression 1 minute 1 second
Apocalypse Manual 16 population average SVM linear regression 40 minutes 3 minutes
ARAMIS-Pascal Manual 20 population average Aalen model - 16 seconds 0.02 seconds
ATRI-Biostat-JMM automatic 15 random forest random forest linear mixed effects model 2 days 1 second
ATRI-Biostat-LTJMM automatic 15 random forest random forest DPM 2 days 1 second
ATRI-Biostat-MA automatic 15 random forest random forest DPM + linear mixed effects model 2 days 1 second
BGU-LSTM automatic 67 none feed-forward NN LSTM 1 day milliseconds
BGU-RF/ BGU-RFFIX automatic ~67+1340* none semi-temporal RF semi-temporal RF a few minutes milliseconds
BIGS2 automatic all Iterative Soft-Thresholded SVD RF linear regression 2.2 seconds 0.001 seconds
Billabong (all) Manual 15-16 linear regression linear scale non-parametric SM 7 hours 0.13 seconds
BORREGOSTECMTY automatic ~100 + 400* nearest-neighbour regression ensemble ensemble of regression + hazard models 18 hours 0.001 seconds
BravoLab automatic 25 hot deck LSTM LSTM 1 hour a few seconds
CBIL Manual 21 linear interpolation LSTM LSTM 1 hour one minute
Chen-MCW Manual 9 none linear regression DPM 4 hours \< 1 hour
CN2L-NeuralNetwork automatic all forward-filling RNN RNN 24 hours a few seconds
CN2L-RandomForest Manual >200 forward-filling RF RF 15 minutes \< 1 minute
CN2L-Average automatic all forward-filling RNN/RF RNN/RF 24 hours \< 1 minute
CyberBrains Manual 5 population average linear regression linear regression 20 seconds 20 seconds
DIKU (all) semi-automatic 18 none Bayesian classifier/LDA + DPM DPM 290 seconds 0.025 seconds
DIVE Manual 13 none KDE+DPM DPM 20 minutes 0.06 seconds
EMC1 automatic 250 nearest neighbour DPM + 2D spline + SVM DPM + 2D spline 80 minutes a few seconds
EMC-EB automatic 200-338 nearest-neighbour SVM classifier SVM regressor 20 seconds a few seconds
FortuneTellerFish-Control Manual 19 nearest neighbour multiclass ECOC SVM linear mixed effects model 1 minute \< 1 second
FortuneTellerFish-SuStaIn Manual 19 nearest neighbour multiclass ECOC SVM + DPM linear mixed effects model + DPM 5 hours \< 1 second
Frog automatic ~70+420* none gradient boosting gradient boosting 1 hour -
GlassFrog-LCMEM-HDR semi-automatic all forward-fill multi-state model DPM + regression 15 minutes 2 minutes
GlassFrog-SM Manual 7 linear model multi-state model parametric SM 93 seconds 0.1 seconds
GlassFrog-Average semi-automatic all forward-fill/linear multi-state model DPM + SM + regression 15 minutes 2 minutes
IBM-OZ-Res Manual 10-15 filled with zero stochastic gradient boosting stochastic gradient boosting 20 minutes 0.1 seconds
ITESMCEM Manual 48 mean of previous values RF LASSO + Bayesian ridge regression 20 minutes 0.3 seconds
lmaUCL (all) Manual 5 regression multi-task learning multi-task learning 2 hours milliseconds
Mayo-BAI-ASU Manual 15 population average linear mixed effects model linear mixed effects model 20 minutes 1.3 seconds
Orange Manual 17 none clinician’s decision tree clinician’s decision tree none 0.2 seconds
Rocket manual 6 median of diagnostic group linear mixed effects model DPM 5 minutes 0.3 seconds
SBIA Manual 30-70 dropped visits with missing data SVM + density estimator linear mixed effects model 1 minute a few seconds
SPMC-Plymouth (all) Automatic 20 none ? - 1 minute
SmallHeads-NeuralNetwork automatic 376 nearest neighbour deep fully -connected NN deep fully -connected NN 40 minutes 0.06 seconds
SmallHeads-LinMixedEffects automatic ? nearest neighbour - linear mixed effects model 25 minutes 0.13 seconds
Sunshine (all) semi-automatic 6 population average SVM linear model 30 minutes \< 1 minute
Threedays Manual 16 none RF - 1 minute 3 seconds
Tohka-Ciszek-SMNSR Manual ~32 nearest neighbour - SMNSR several hours a few seconds
Tohka-Ciszek-RandomForestLin Manual ~32 mean patient value RF linear model a few minutes a few seconds
VikingAI (all) Manual 10 none DPM + ordered logit model DPM 10 hours 8 seconds
BenchmaskLastVisit None 3 none constant model constant model 7 seconds milliseconds
BenchmarkMixedEffects None 3 none Gaussian model linear mixed effects model 30 seconds 0.003 seconds
BenchmarkMixedEffectsAPOE None 4 none Gaussian model linear mixed effects model 30 seconds 0.003 seconds
BenchmarkSVM Manual 6 mean of previous values SVM support vector regressor (SVR) 20 seconds 0.001 seconds

Table 1. Summary of methods used in the TADPOLE submissions. Keywords: SVM – Support Vector Machine, RF – random forest, LSTM – long short-term memory network, NN – neural network, RNN – recurrent neural network, SMNSR - Sparse Multimodal Neighbourhood Search Regression, DPM – disease progression model, KDE – kernel density estimation, LDA – linear discriminant analysis, SM – slope model, ECOC - error-correcting output codes, SVD – singular value decomposition (*) Augmented features

Participant statistics

Locations of participating teams

Team categories

Prediction methods

Organised by:  

Prize sponsors: