This leaderboard ranks **embedding models** for **3D brain structural MRIs**, focusing on both **image reconstruction** and **downstream task** performance. The purpose is to provide quantitative benchmarks for brain structure embedding models based on both image compression and biological relevance. ### Evaluations - **Reconstruction Error**: - **[L1Loss](https://pytorch.org/docs/stable/generated/torch.nn.L1Loss.html)** (lower is better) - **[PerceptualLoss](https://docs.monai.io/en/stable/losses.html#perceptualloss)** (lower is better; see (1)) - **[SSIM](https://en.wikipedia.org/wiki/Structural_similarity_index_measure)** (higher is better) - **[PSNR](https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio)** (higher is better) - **Downstream Models**: use embeddings/image-derived phenotypes to predict - **Age** (mean absolute error, MAE) - **Sex** (classification accuracy %) - **Clinical Diagnosis** (classification accuracy %) ### Model info Models can be evaluated if they meet the following criteria: - can accept 113x137x113 1.5mm^3 structural MRIs as input (see [radiata-ai/brain-structure](https://huggingface.co/datasets/radiata-ai/brain-structure)) - can be used in inference mode to produce embedding vectors and image reconstructions in a forward pass Three types of models are considered: 1) autoencoders, 2) linear dimensionality reduction models like PCA, and 3) image-derived phenotypes (IDPs). Linear PCA models are a useful comparison for autoencoder models performing deep non-linear dimensionality reduction (2). IDP models include a set features extracted from each scan like brain region gray matter volumes and are evaluated only for downstream models, where they can be compared to embeddings in their ability to predict age, sex, and disease diagnosis. Example models include: - [brain2vec](https://huggingface.co/radiata-ai/brain2vec) (based on (3)) - [brain2vec_PCA](https://huggingface.co/radiata-ai/brain2vec_PCA) ### Evaluation datasets - [radiata-ai/brain-structure](https://huggingface.co/datasets/radiata-ai/brain-structure), which has **train**, **validation**, and **test** splits (80%/10%/10%). This dataset includes 3794 anonymized 3D structural MRI brain scans (T1-weighted MPRAGE NIfTI files) from 2607 individuals included in five publicly available datasets: DLBS, IXI, NKI-RS, OASIS-1, and OASIS-2. Subjects have a mean age of 45 ± 24 (age range 6-98). 3529 scans come from cognitively normal individuals and 265 scans from individuals with an Alzheimer's disease clinical diagnosis. Scan image dimensions are 113x137x113, 1.5mm^3 resolution, aligned to MNI152 space. Splits are balanced for age, sex, clinical diagnosis, and study. - A private dataset is forthcoming. - Note: there is currently no guarantee that embedding models have not been trained on the validation/test datasets. Hence the need for private datasets. ### Downstream models Downstream models are fit using feature vectors (embeddings or IDPs) for all scans from the training set. For age, linear regression is used. This model is then applied to validation and testing sets to measure out-of-sample performance. For sex (genetic F/M) and clinical diagnosis (clinical Alzheimer's disease (AD)/cognitively normal(CN)), linear discriminant analysis classification is used. These models are then applied to validation and testing sets to measure out-of-sample performance. Age and sex models are only fit and evaluated on scans from subjects with a clinical diagnosis of cognitively normal. ### Rank computation Each metric is ranked within its category for the test dataset results. Overall rank is computed by combining **reconstruction** and **downstream** ranks. ### Repository The evaluation code can be found in the Radiata [leaderboard GitHub repo](https://github.com/radiata-ai/leaderboard). ### Citation ``` @misc{brain2vec-leaderboard, author = {Jesse Brown and Clayton Young}, title = {Brain2vec Leaderboard}, year = {2025}, url = {https://huggingface.co/spaces/radiata-ai/brain2vec_leaderboard}, publisher = {Hugging Face}, } ``` ### Contact For any questions or to submit a model please contact jesse.brown@radiata.ai. ### References 1. Guo P, Zhao C, Yang D, Xu Z, Nath V, Tang Y, et al. MAISI: Medical AI for Synthetic Imaging [Internet]. arXiv; 2024. Available from: http://arxiv.org/abs/2409.11169 2. Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006 Jul 28;313(5786):504–7. 3. Puglisi L, Alexander DC, Ravì D. Enhancing Spatiotemporal Disease Progression Models via Latent Diffusion and Prior Knowledge [Internet]. arXiv; 2024. Available from: http://arxiv.org/abs/2405.03328 ### Roadmap - Private evaluation dataset - Allow model submissions - Expanded image-derived phenotype set