TY - JOUR
T1 - MindEye2
T2 - 41st International Conference on Machine Learning, ICML 2024
AU - Scotti, Paul S.
AU - Tripathy, Mihir
AU - Villanueva, Cesar Kadir Torrico
AU - Kneeland, Reese
AU - Chen, Tong
AU - Narang, Ashutosh
AU - Santhirasegaran, Charan
AU - Xu, Jonathan
AU - Naselaris, Thomas
AU - Norman, Kenneth A.
AU - Abraham, Tanishq Mathew
N1 - Publisher Copyright:
Copyright 2024 by the author(s)
PY - 2024/7
Y1 - 2024/7
N2 - Reconstructions of visual perception from brain activity have improved tremendously, but the practical utility of such methods has been limited. This is because such models are trained independently per subject where each subject requires dozens of hours of expensive fMRI training data to attain high-quality results. The present work showcases high-quality reconstructions using only 1 hour of fMRI training data. We pretrain our model across 7 subjects and then fine-tune on minimal data from a new subject. Our novel functional alignment procedure linearly maps all brain data to a shared-subject latent space, followed by a shared non-linear mapping to CLIP image space. We then map from CLIP space to pixel space by fine-tuning Stable Diffusion XL to accept CLIP latents as inputs instead of text. This approach improves out-of-subject generalization with limited training data and also attains state-of-the-art image retrieval and reconstruction metrics compared to single-subject approaches. MindEye2 demonstrates how accurate reconstructions of perception are possible from a single visit to the MRI facility. All code is available on GitHub.
AB - Reconstructions of visual perception from brain activity have improved tremendously, but the practical utility of such methods has been limited. This is because such models are trained independently per subject where each subject requires dozens of hours of expensive fMRI training data to attain high-quality results. The present work showcases high-quality reconstructions using only 1 hour of fMRI training data. We pretrain our model across 7 subjects and then fine-tune on minimal data from a new subject. Our novel functional alignment procedure linearly maps all brain data to a shared-subject latent space, followed by a shared non-linear mapping to CLIP image space. We then map from CLIP space to pixel space by fine-tuning Stable Diffusion XL to accept CLIP latents as inputs instead of text. This approach improves out-of-subject generalization with limited training data and also attains state-of-the-art image retrieval and reconstruction metrics compared to single-subject approaches. MindEye2 demonstrates how accurate reconstructions of perception are possible from a single visit to the MRI facility. All code is available on GitHub.
UR - http://www.scopus.com/inward/record.url?scp=85203816714&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85203816714&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85203816714
SN - 2640-3498
VL - 235
SP - 44038
EP - 44059
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
Y2 - 21 July 2024 through 27 July 2024
ER -