Leaderboard - TADPOLE - Grand Challenge

Contents¶

What is the leaderboard?
Rules
Useful Scripts
Leaderboard Table

What is the leaderboard?¶

The leaderboard is a system through which participants can submit preliminary results and compare the accuracy of their predictions with other teams, using existing ADNI data. More precisely, we generate a leaderboard table (bottom of the page), which shows the performance scores of every team submission. The table is updated live. Participants need to make predictions using the datasets created from existing ADNI data:

LB1 – longitudinal data from ADNI1 (equivalent of D1). Just as D1, this is typically the dataset on which one will train the model.
LB2 – longitudinal data from ADNI1 rollovers into ADNIGO and ADNI2 (equivalent of D2). This dataset is normally used after training, as input for the forecasts.

The forecasts from LB2 will be evaluated on post-ADNI1 followup data from individuals in LB2:

LB4 – leaderboard test set (equivalent of D4). As this is the prediction dataset, it should NOT be used at all by the model providing the forecast.

We provide scripts for generating the LB1, LB2 and LB4 datasets (requires the TADPOLE datasets spreadsheet downloadable from ADNI). See the official TADPOLE github repository, under the folder "evaluation", explained in the README and Makefile. In the repository we also provide the scripts that compute the performance metrics and create the table itself (for transparency). The Makefile shows how the scripts should be run and can also be used to run the full pipeline using 'make leaderboard' (for a leaderboard submission) or 'make eval' (for a full TADPOLE submission, i.e. non-leaderboard). Before running the Makefile, don't forget to update the MATLAB path at the beginning of the Makefile. If you need further assistance with generating the datasets, do not hesitate to contact us on the Google Group.

The leaderboard is currently LIVE and is updated every 5 minutes. We encourage TADPOLE participants to try the leaderboard first, before making a proper submission.

Rules¶

Participants are NOT allowed to cheat by fitting their models on the prediction dataset (LB4) or by using any information from the LB4 entries. There are no prizes associated with the leaderboard. This is just a space for participants to compare their model performance while the comperition is running.
Participants can make as many leaderboard submissions as they like, which should include an index at the end of the submission file name.
Participants should name their submission as 'TADPOLE_Submission_Leaderboard_.csv'. This will enable our script to pick up the submission file automatically. Example names: TADPOLE_Submission_Leaderboard_PowerRangers1.csv or TADPOLE_Submission_Leaderboard_OxfordSuperAccurateLinearModelWithAgeRegression5.csv. No underscores should be included in team names.
Participants don't necessary have to make a leaderboard submission in order to find their results. The script that computes the performance metrics for the leaderboard dataset (evalOneSubmission.py) is provided on the github repository. Users can run the script on their local machine, make changes to their model, and submit a leaderboard entry when ready.

Useful Scripts¶

The following scripts need to be run in this order:

makeLeaderboardDataset.py - creates the leaderboard datasets LB1 (training), LB2 (subjects for which forecasts are requires) and LB4 (biomarker values for LB2 subjects at later visits). Also creates the submission skeleton for the leaderboard TADPOLE_Submission_Leaderboard_TeamName.csv
TADPOLE_SimpleForecastExampleLeaderboard.m - generates forecasts for every subject in LB2 using a simple method. Change this file in order to
evalOneSubmission.py - evaluates the previously generated user forecasts against LB4

If everything runs without errors and step 3 prints out the performance measures successfully, your leaderboard submission spreadsheet is ready to be uploaded via the TADPOLE website. You must be registered on the website, and logged in, in order to upload via the Submit page.

See the Makefile (leaderboard section) for the exact commands required to run these scripts. If you need further help on how to run the Python/MATLAB scripts, see this thread on the Google Group.

Leaderboard Table¶

This is the leaderboard table, which is updated live (every 20 minutes). Some test entries might be included along the way, which will be tagged with 'UCLTest'. These entries are there for us to test further modifications to the leaderboard system and should be disregarded.

Legend:

MAUC - Multiclass Area Under the Curve
BCA - Balanced Classification Accuracy
MAE - Mean Absolute Error
WES - Weighted Error Score
CPA - Coverage Probability Accuracy for 50% Confidence Interval
ADAS - Alzheimer's Disease Assessment Scale Cognitive (13)
VENTS - Ventricle Volume
RANK - Reflects the same criteria for deciding the overall winner! We first compute the sum of ranks from MAUC, ADAS MAE and VENTS MAE, then derive the final ranking from these sums of ranks. For example, the top leaderboard entry will have the lowest sum of ranks from these three categories.

RANK	FILE NAME	MAUC RANK	MAUC	BCA	ADAS RANK	ADAS MAE	ADAS WES	ADAS CPA	VENTS RANK	VENTS MAE	VENTS WES	VENTS CPA
1.0	Frog	1.0	0.931	0.849	5.0	4.85	4.74	0.44	10.0	0.45	0.33	0.47
2.0	EMC1-Std	8.0	0.898	0.811	24.5	6.05	5.40	0.45	1.5	0.41	0.29	0.43
3.0	VikingAI-Sigmoid	16.0	0.875	0.760	8.0	5.20	5.11	0.02	11.5	0.45	0.35	0.20
4.0	EMC1-Custom	11.0	0.892	0.798	24.5	6.05	5.40	0.45	1.5	0.41	0.29	0.43
5.0	CBIL	9.0	0.897	0.803	16.0	5.66	5.65	0.37	13.0	0.46	0.46	0.09
6.0	Apocalypse	7.0	0.902	0.827	15.0	5.57	5.57	0.50	20.0	0.52	0.52	0.50
7.5	GlassFrog-SM	5.0	0.902	0.825	18.0	5.77	5.92	0.20	21.0	0.52	0.33	0.20
7.5	GlassFrog-Average	5.0	0.902	0.825	9.0	5.26	5.27	0.26	30.0	0.68	0.60	0.33
9.0	BORREGOTECMTY	19.0	0.866	0.808	21.0	5.90	5.82	0.39	5.0	0.43	0.37	0.40
10.0	BenchmarkMixedEffects	25.0	0.846	0.706	1.0	4.19	4.19	0.31	23.0	0.56	0.56	0.50
11.0	EMC-EB	3.0	0.907	0.805	40.0	6.75	6.66	0.50	9.0	0.45	0.40	0.48
12.0	lmaUCL-Covariates	22.0	0.852	0.760	28.0	6.28	6.29	0.28	3.0	0.42	0.41	0.11
13.0	VikingAI-Logistic	20.0	0.865	0.754	22.0	6.02	5.91	0.26	11.5	0.45	0.35	0.20
14.5	lmaUCL-Std	21.0	0.859	0.781	29.0	6.30	6.33	0.26	4.0	0.42	0.41	0.09
14.5	CN2L-Average	28.0	0.843	0.792	10.0	5.31	5.31	0.35	16.0	0.49	0.49	0.33
16.5	CN2L-RandomForest	10.0	0.896	0.792	17.0	5.73	5.73	0.42	32.0	0.71	0.71	0.41
16.5	FortuneTellerFish-SuStaIn	41.0	0.806	0.685	4.0	4.81	4.81	0.21	14.0	0.49	0.49	0.18
18.0	CN2L-NeuralNetwork	42.0	0.783	0.717	11.0	5.36	5.36	0.34	7.0	0.44	0.44	0.27
19.0	Tohka-Ciszek-RandomForestLin	17.0	0.875	0.796	23.0	6.03	6.03	0.15	22.0	0.56	0.56	0.37
20.0	BenchmarkMixedEffectsAPOE	36.0	0.822	0.749	3.0	4.75	4.75	0.36	24.0	0.57	0.57	0.40
21.0	BGU-LSTM	12.0	0.883	0.779	26.0	6.09	6.12	0.39	26.0	0.60	0.60	0.23
22.0	DIKU-GeneralisedLog-Custom	13.0	0.878	0.790	12.5	5.40	5.40	0.26	39.5	1.05	1.05	0.05
23.0	DIKU-GeneralisedLog-Std	14.0	0.877	0.790	12.5	5.40	5.40	0.26	39.5	1.05	1.05	0.05
24.5	AlgosForGood	24.0	0.847	0.810	14.0	5.46	5.11	0.13	31.0	0.69	3.31	0.19
24.5	CyberBrains	35.0	0.823	0.747	7.0	5.16	5.16	0.24	27.0	0.62	0.62	0.12
26.0	lmaUCL-halfD1	27.0	0.845	0.753	39.0	6.53	6.51	0.31	6.0	0.44	0.42	0.13
27.0	BGU-RF	29.0	0.838	0.673	30.5	6.33	6.10	0.35	17.5	0.50	0.38	0.26
28.0	Mayo-BAI-ASU	53.0	0.691	0.624	6.0	4.98	4.98	0.32	19.0	0.52	0.52	0.40
29.0	BGU-RFFIX	33.0	0.831	0.673	30.5	6.33	6.10	0.35	17.5	0.50	0.38	0.26
30.0	FortuneTellerFish-Control	32.0	0.834	0.692	2.0	4.70	4.70	0.22	51.0	1.38	1.38	0.50
31.0	GlassFrog-LCMEM-HDR	5.0	0.902	0.825	32.0	6.34	6.21	0.47	52.0	1.66	1.59	0.41
32.0	SBIA	44.0	0.776	0.721	44.0	7.10	7.38	0.40	8.0	0.44	0.31	0.13
33.0	Chen-MCW-Stratify	23.0	0.848	0.783	37.5	6.48	6.24	0.23	37.5	1.01	1.00	0.11
34.0	Rocket	55.0	0.680	0.519	19.0	5.81	5.71	0.34	29.0	0.64	0.64	0.29
35.5	Chen-MCW-Std	30.0	0.836	0.778	37.5	6.48	6.24	0.23	37.5	1.01	1.00	0.11
35.5	BenchmarkSVM	31.0	0.836	0.764	41.0	6.82	6.82	0.42	33.0	0.86	0.84	0.50
37.0	DIKU-ModifiedMri-Custom	37.5	0.807	0.670	34.5	6.44	6.44	0.27	35.5	0.92	0.92	0.01
38.0	DIKU-ModifiedMri-Std	39.5	0.806	0.670	34.5	6.44	6.44	0.27	35.5	0.92	0.92	0.01
39.0	DIVE	52.0	0.708	0.568	43.0	7.10	7.10	0.34	15.0	0.49	0.49	0.13
40.0	ITESMCEM	54.0	0.680	0.657	27.0	6.26	6.26	0.35	34.0	0.92	0.92	0.43
41.0	BenchmarkLastVisit	45.5	0.774	0.792	42.0	7.05	7.05	0.45	28.0	0.63	0.61	0.47
42.0	Sunshine-Conservative	26.0	0.845	0.816	45.5	7.90	7.90	0.50	44.5	1.12	1.12	0.50
43.0	BravoLab	47.0	0.771	0.682	48.0	8.22	8.22	0.49	25.0	0.58	0.58	0.41
44.0	DIKU-ModifiedLog-Custom	37.5	0.807	0.670	34.5	6.44	6.44	0.27	48.5	1.17	1.17	0.06
45.0	DIKU-ModifiedLog-Std	39.5	0.806	0.670	34.5	6.44	6.44	0.27	48.5	1.17	1.17	0.06
46.0	Sunshine-Std	34.0	0.825	0.771	45.5	7.90	7.90	0.50	44.5	1.12	1.12	0.50
47.0	Billabong-UniAV45	50.0	0.720	0.616	49.5	9.22	8.82	0.29	42.5	1.09	0.99	0.45
48.0	Billabong-Uni	51.0	0.718	0.622	49.5	9.22	8.82	0.29	42.5	1.09	0.99	0.45
49.0	ATRI-Biostat-JMM	43.0	0.779	0.710	52.0	12.88	69.62	0.35	55.0	1.95	5.12	0.33
50.0	Billabong-Multi	57.0	0.541	0.556	56.0	27.01	19.90	0.46	41.0	1.07	1.07	0.45
51.0	ATRI-Biostat-MA	48.0	0.741	0.671	53.0	12.88	11.32	0.19	54.0	1.84	5.27	0.23
52.0	BIGS2	59.0	0.455	0.488	51.0	11.62	14.65	0.50	50.0	1.20	1.12	0.07
53.0	Billabong-MultiAV45	58.0	0.527	0.530	57.0	28.45	21.22	0.47	46.0	1.13	1.07	0.47
54.0	ATRI-Biostat-LTJMM	56.0	0.636	0.563	55.0	16.07	74.65	0.33	53.0	1.80	5.01	0.26
-	Threedays	2.0	0.921	0.823	-	-	-	-	-	-	-	-
-	ARAMIS-Pascal	15.0	0.876	0.850	-	-	-	-	-	-	-	-
-	IBM-OZ-Res	18.0	0.868	0.766	-	-	-	-	47.0	1.15	1.15	0.50
-	Orange	45.5	0.774	0.792	-	-	-	-	-	-	-	-
-	SMALLHEADS-NeuralNet	49.0	0.737	0.605	54.0	13.87	13.87	0.41	-	-	-	-
-	SMALLHEADS-LinMixedEffects	-	-	-	47.0	8.09	7.94	0.04	-	-	-	-
-	Tohka-Ciszek-SMNSR	-	-	-	20.0	5.87	5.87	0.14	-	-	-	-

Organised by:

Prize sponsors: