Contents

What is the leaderboard?

The leaderboard is a system through which participants can submit preliminary results and compare the accuracy of their predictions with other teams, using existing ADNI data. More precisely, we generate a leaderboard table (bottom of the page), which shows the performance scores of every team submission. The table is updated live. Participants need to make predictions using the datasets created from existing ADNI data:

  • LB1 – longitudinal data from ADNI1 (equivalent of D1). Just as D1, this is typically the dataset on which one will train the model.
  • LB2 – longitudinal data from ADNI1 rollovers into ADNIGO and ADNI2 (equivalent of D2). This dataset is normally used after training, as input for the forecasts.

The forecasts from LB2 will be evaluated on post-ADNI1 followup data from individuals in LB2:

  • LB4 – leaderboard test set (equivalent of D4). As this is the prediction dataset, it should NOT be used at all by the model providing the forecast.

We provide scripts for generating the LB1, LB2 and LB4 datasets (requires the TADPOLE datasets spreadsheet downloadable from ADNI). See the official TADPOLE github repository, under the folder "evaluation", explained in the README and Makefile. In the repository we also provide the scripts that compute the performance metrics and create the table itself (for transparency). The Makefile shows how the scripts should be run and can also be used to run the full pipeline using 'make leaderboard' (for a leaderboard submission) or 'make eval' (for a full TADPOLE submission, i.e. non-leaderboard). Before running the Makefile, don't forget to update the MATLAB path at the beginning of the Makefile. If you need further assistance with generating the datasets, do not hesitate to contact us on the Google Group.

The leaderboard is currently LIVE and is updated every 5 minutes. We encourage TADPOLE participants to try the leaderboard first, before making a proper submission.

Rules

  1. Participants are NOT allowed to cheat by fitting their models on the prediction dataset (LB4) or by using any information from the LB4 entries. There are no prizes associated with the leaderboard. This is just a space for participants to compare their model performance while the comperition is running.
  2. Participants can make as many leaderboard submissions as they like, which should include an index at the end of the submission file name.
  3. Participants should name their submission as 'TADPOLE_Submission_Leaderboard_.csv'. This will enable our script to pick up the submission file automatically. Example names: TADPOLE_Submission_Leaderboard_PowerRangers1.csv or TADPOLE_Submission_Leaderboard_OxfordSuperAccurateLinearModelWithAgeRegression5.csv. No underscores should be included in team names.
  4. Participants don't necessary have to make a leaderboard submission in order to find their results. The script that computes the performance metrics for the leaderboard dataset (evalOneSubmission.py) is provided on the github repository. Users can run the script on their local machine, make changes to their model, and submit a leaderboard entry when ready.

Useful Scripts

The following scripts need to be run in this order:

  1. makeLeaderboardDataset.py - creates the leaderboard datasets LB1 (training), LB2 (subjects for which forecasts are requires) and LB4 (biomarker values for LB2 subjects at later visits). Also creates the submission skeleton for the leaderboard TADPOLE_Submission_Leaderboard_TeamName.csv
  2. TADPOLE_SimpleForecastExampleLeaderboard.m - generates forecasts for every subject in LB2 using a simple method. Change this file in order to
  3. evalOneSubmission.py - evaluates the previously generated user forecasts against LB4

If everything runs without errors and step 3 prints out the performance measures successfully, your leaderboard submission spreadsheet is ready to be uploaded via the TADPOLE website. You must be registered on the website, and logged in, in order to upload via the Submit page.

See the Makefile (leaderboard section) for the exact commands required to run these scripts. If you need further help on how to run the Python/MATLAB scripts, see this thread on the Google Group.

Leaderboard Table

This is the leaderboard table, which is updated live (every 20 minutes). Some test entries might be included along the way, which will be tagged with 'UCLTest'. These entries are there for us to test further modifications to the leaderboard system and should be disregarded.

Legend:

  • MAUC - Multiclass Area Under the Curve
  • BCA - Balanced Classification Accuracy
  • MAE - Mean Absolute Error
  • WES - Weighted Error Score
  • CPA - Coverage Probability Accuracy for 50% Confidence Interval
  • ADAS - Alzheimer's Disease Assessment Scale Cognitive (13)
  • VENTS - Ventricle Volume
  • RANK - Reflects the same criteria for deciding the overall winner! We first compute the sum of ranks from MAUC, ADAS MAE and VENTS MAE, then derive the final ranking from these sums of ranks. For example, the top leaderboard entry will have the lowest sum of ranks from these three categories.

RANK
FILE NAME MAUC RANK MAUC BCA ADAS RANK ADAS MAE ADAS WES ADAS CPA VENTS RANK VENTS MAE VENTS WES VENTS CPA
1.0 Frog 1.0 0.931 0.849 5.0 4.85 4.74 0.44 10.0 0.45 0.33 0.47
2.0 EMC1-Std 8.0 0.898 0.811 24.5 6.05 5.40 0.45 1.5 0.41 0.29 0.43
3.0 VikingAI-Sigmoid 16.0 0.875 0.760 8.0 5.20 5.11 0.02 11.5 0.45 0.35 0.20
4.0 EMC1-Custom 11.0 0.892 0.798 24.5 6.05 5.40 0.45 1.5 0.41 0.29 0.43
5.0 CBIL 9.0 0.897 0.803 16.0 5.66 5.65 0.37 13.0 0.46 0.46 0.09
6.0 Apocalypse 7.0 0.902 0.827 15.0 5.57 5.57 0.50 20.0 0.52 0.52 0.50
7.5 GlassFrog-SM 5.0 0.902 0.825 18.0 5.77 5.92 0.20 21.0 0.52 0.33 0.20
7.5 GlassFrog-Average 5.0 0.902 0.825 9.0 5.26 5.27 0.26 30.0 0.68 0.60 0.33
9.0 BORREGOTECMTY 19.0 0.866 0.808 21.0 5.90 5.82 0.39 5.0 0.43 0.37 0.40
10.0 BenchmarkMixedEffects 25.0 0.846 0.706 1.0 4.19 4.19 0.31 23.0 0.56 0.56 0.50
11.0 EMC-EB 3.0 0.907 0.805 40.0 6.75 6.66 0.50 9.0 0.45 0.40 0.48
12.0 lmaUCL-Covariates 22.0 0.852 0.760 28.0 6.28 6.29 0.28 3.0 0.42 0.41 0.11
13.0 VikingAI-Logistic 20.0 0.865 0.754 22.0 6.02 5.91 0.26 11.5 0.45 0.35 0.20
14.5 lmaUCL-Std 21.0 0.859 0.781 29.0 6.30 6.33 0.26 4.0 0.42 0.41 0.09
14.5 CN2L-Average 28.0 0.843 0.792 10.0 5.31 5.31 0.35 16.0 0.49 0.49 0.33
16.5 CN2L-RandomForest 10.0 0.896 0.792 17.0 5.73 5.73 0.42 32.0 0.71 0.71 0.41
16.5 FortuneTellerFish-SuStaIn 41.0 0.806 0.685 4.0 4.81 4.81 0.21 14.0 0.49 0.49 0.18
18.0 CN2L-NeuralNetwork 42.0 0.783 0.717 11.0 5.36 5.36 0.34 7.0 0.44 0.44 0.27
19.0 Tohka-Ciszek-RandomForestLin 17.0 0.875 0.796 23.0 6.03 6.03 0.15 22.0 0.56 0.56 0.37
20.0 BenchmarkMixedEffectsAPOE 36.0 0.822 0.749 3.0 4.75 4.75 0.36 24.0 0.57 0.57 0.40
21.0 BGU-LSTM 12.0 0.883 0.779 26.0 6.09 6.12 0.39 26.0 0.60 0.60 0.23
22.0 DIKU-GeneralisedLog-Custom 13.0 0.878 0.790 12.5 5.40 5.40 0.26 39.5 1.05 1.05 0.05
23.0 DIKU-GeneralisedLog-Std 14.0 0.877 0.790 12.5 5.40 5.40 0.26 39.5 1.05 1.05 0.05
24.5 AlgosForGood 24.0 0.847 0.810 14.0 5.46 5.11 0.13 31.0 0.69 3.31 0.19
24.5 CyberBrains 35.0 0.823 0.747 7.0 5.16 5.16 0.24 27.0 0.62 0.62 0.12
26.0 lmaUCL-halfD1 27.0 0.845 0.753 39.0 6.53 6.51 0.31 6.0 0.44 0.42 0.13
27.0 BGU-RF 29.0 0.838 0.673 30.5 6.33 6.10 0.35 17.5 0.50 0.38 0.26
28.0 Mayo-BAI-ASU 53.0 0.691 0.624 6.0 4.98 4.98 0.32 19.0 0.52 0.52 0.40
29.0 BGU-RFFIX 33.0 0.831 0.673 30.5 6.33 6.10 0.35 17.5 0.50 0.38 0.26
30.0 FortuneTellerFish-Control 32.0 0.834 0.692 2.0 4.70 4.70 0.22 51.0 1.38 1.38 0.50
31.0 GlassFrog-LCMEM-HDR 5.0 0.902 0.825 32.0 6.34 6.21 0.47 52.0 1.66 1.59 0.41
32.0 SBIA 44.0 0.776 0.721 44.0 7.10 7.38 0.40 8.0 0.44 0.31 0.13
33.0 Chen-MCW-Stratify 23.0 0.848 0.783 37.5 6.48 6.24 0.23 37.5 1.01 1.00 0.11
34.0 Rocket 55.0 0.680 0.519 19.0 5.81 5.71 0.34 29.0 0.64 0.64 0.29
35.5 Chen-MCW-Std 30.0 0.836 0.778 37.5 6.48 6.24 0.23 37.5 1.01 1.00 0.11
35.5 BenchmarkSVM 31.0 0.836 0.764 41.0 6.82 6.82 0.42 33.0 0.86 0.84 0.50
37.0 DIKU-ModifiedMri-Custom 37.5 0.807 0.670 34.5 6.44 6.44 0.27 35.5 0.92 0.92 0.01
38.0 DIKU-ModifiedMri-Std 39.5 0.806 0.670 34.5 6.44 6.44 0.27 35.5 0.92 0.92 0.01
39.0 DIVE 52.0 0.708 0.568 43.0 7.10 7.10 0.34 15.0 0.49 0.49 0.13
40.0 ITESMCEM 54.0 0.680 0.657 27.0 6.26 6.26 0.35 34.0 0.92 0.92 0.43
41.0 BenchmarkLastVisit 45.5 0.774 0.792 42.0 7.05 7.05 0.45 28.0 0.63 0.61 0.47
42.0 Sunshine-Conservative 26.0 0.845 0.816 45.5 7.90 7.90 0.50 44.5 1.12 1.12 0.50
43.0 BravoLab 47.0 0.771 0.682 48.0 8.22 8.22 0.49 25.0 0.58 0.58 0.41
44.0 DIKU-ModifiedLog-Custom 37.5 0.807 0.670 34.5 6.44 6.44 0.27 48.5 1.17 1.17 0.06
45.0 DIKU-ModifiedLog-Std 39.5 0.806 0.670 34.5 6.44 6.44 0.27 48.5 1.17 1.17 0.06
46.0 Sunshine-Std 34.0 0.825 0.771 45.5 7.90 7.90 0.50 44.5 1.12 1.12 0.50
47.0 Billabong-UniAV45 50.0 0.720 0.616 49.5 9.22 8.82 0.29 42.5 1.09 0.99 0.45
48.0 Billabong-Uni 51.0 0.718 0.622 49.5 9.22 8.82 0.29 42.5 1.09 0.99 0.45
49.0 ATRI-Biostat-JMM 43.0 0.779 0.710 52.0 12.88 69.62 0.35 55.0 1.95 5.12 0.33
50.0 Billabong-Multi 57.0 0.541 0.556 56.0 27.01 19.90 0.46 41.0 1.07 1.07 0.45
51.0 ATRI-Biostat-MA 48.0 0.741 0.671 53.0 12.88 11.32 0.19 54.0 1.84 5.27 0.23
52.0 BIGS2 59.0 0.455 0.488 51.0 11.62 14.65 0.50 50.0 1.20 1.12 0.07
53.0 Billabong-MultiAV45 58.0 0.527 0.530 57.0 28.45 21.22 0.47 46.0 1.13 1.07 0.47
54.0 ATRI-Biostat-LTJMM 56.0 0.636 0.563 55.0 16.07 74.65 0.33 53.0 1.80 5.01 0.26
- Threedays 2.0 0.921 0.823 - - - - - - - -
- ARAMIS-Pascal 15.0 0.876 0.850 - - - - - - - -
- IBM-OZ-Res 18.0 0.868 0.766 - - - - 47.0 1.15 1.15 0.50
- Orange 45.5 0.774 0.792 - - - - - - - -
- SMALLHEADS-NeuralNet 49.0 0.737 0.605 54.0 13.87 13.87 0.41 - - - -
- SMALLHEADS-LinMixedEffects - - - 47.0 8.09 7.94 0.04 - - - -
- Tohka-Ciszek-SMNSR - - - 20.0 5.87 5.87 0.14 - - - -

Organised by:  

Prize sponsors: