Neurodata is not abundant on the internet compared to other data types. There are some data, but they are poorly organized and often difficult to find. Hopefully this page can serve as an effective jumping off point for those looking for open source data. 

🚧 Please pardon the dust. I’ve mostly been dumping brief descriptions here without any particular organization and plan to update later. 🚧
⚠️ If you know of any databases or datasets that are not included here, please let me know (just click here). I’ll add it. ⚠️

Selected Datasets


A cluster of recent(ish) academic papers that include data. Most datasets were recorded in humans (3 are from NHP) with a healthy mix of behavior: speech, reaching, and memory recall. Most data were recorded using ECoG but there are a few microelectrode array datasets, and one sEEG dataset.

  1. Anumanchipalli GK, Chartier J, Chang EF. Speech synthesis from neural decoding of spoken sentences. Nature. 2019
  2. Livezey JA, Bouchard KE, Chang EF. Deep learning as a tool for neural data analysis: Speech classification and cross-frequency coupling in human sensorimotor cortex. PLOS Computational Biology. 2019 (Dataset on DANDI)
  3. Dichter BK, Breshears JD, Leonard MK, Chang EF. The Control of Vocal Pitch in Human Laryngeal Motor Cortex. Cell. 2018 (Code available “upon request”)
  4. Peterson SM, Singh SH, Wang NX, Rao RP, Brunton BW. Behavioral and neural variability of naturalistic arm movements. BioRxiv. 2020 (This dataset is publicly available at Figshare & contains synchronized neural and behavioral data that can be used to generate Figures 2b–c and Figures 3–8. The data analysis code is available on Github.)
  5. Chao ZC, Nagasaka Y, Fujii N. "Long-term asynchronous decoding of arm motion using electrocorticographic signals in monkeys." Frontiers in Neuroengineering. 2010 (Dataset here.)
  6. Chandravadia N, Liang D, Schjetnan AGP, et al. A NWB-based dataset and processing pipeline of human single-neuron activity during a declarative memory task. Scientific Data. 2020 (Code on Github. NWB data on DANDI.)
  7. Pandarinath C, O’Shea DJ, Collins J, Jozefowicz R, Stavisky SD, Kao JC, et al. Inferring single-trial neural population dynamics using sequential auto-encoders. Nat Methods. 2018 (Dataset on Github.)
  8. Wilson GH, Stavisky SD, Willett FR, et al. Decoding spoken English from intracortical electrode arrays in dorsal precentral gyrus. Journal of Neural Engineering. 2020;17(6):066007. 
  9. Miller KJ. A library of human electrocorticographic data and analyses. Nat Hum Behav. 2019. (Dataset on Stanford website.)
  10. http://memory.psych.upenn.edu/RAM
  11. Miller KJ, Abel TJ, Hebb AO, Ojemann JG. Rapid online language mapping with electrocorticography. Journal of Neurosurgery: Pediatrics. 2011 (This dataset was first identified in Miller et al. 2019.)
  12. Peterson SM, Steine-Hanson Z, Davis N, Rao RPN, Brunton BW. Generalized neural decoders for transfer learning across participants and recording modalities. BioRxiv. 2020 (Paper, dataset, and code.)
  13. Angrick M, Ottenhoff M, Diener L, et al. Real-Time Synthesis of Imagined Speech Processes from Minimally Invasive Recordings of Neural Activity. Neuroscience; 2020. (Data here.)
  14. O'Doherty J, Cardoso M, Makin J, Sabes P, Nonhuman Primate Reaching with Multichannel Sensorimotor Cortex Electrophysiology  (Data here.)

fNIRS: individual datasets


💡 note: many of the datasets below also appear in the OpenfNIRS database. Because I found this database after describing the datasets, I'll leave the link here. 

Open access dataset for simultaneous EEG & NIRS BCI


Two BCI experiments (left vs. right hand motor imagery; mental arithmetic vs. resting state). The dataset was validated using baseline signal analysis methods; classification performance was evaluated for each modality and a combination of both modalities.

note: this dataset appears well documented compared to those below. There are also multimodal data.

  • 29 subjects
  • 14 sources, 16 detectors, 36 physiological channels
  • Includes EEG data for comparison
  • Dataset is focused on BCI (Hybrid EEG + fNIRS)
  • Well documented (see basic data structures of the BBCI Toolbox) data format & easy to understand (fast startup)
  • Well respected research groups (Blankertz, Mueller, Müller)
  • Older (2010 for EEG, 2016 for fNIRS)
  • Data is in .mat format & tutorials are in .m.

Open-Access fNIRS Dataset for Classification of Unilateral Finger- and Foot-Tapping


The concentration changes of oxygenated and reduced hemoglobin were measured, while 30 volunteers repeated each of the three types of overt movements (i.e., left- and right-hand unilateral complex finger-tapping, foot-tapping) for 25 times. The ternary support vector machine (SVM) classification accuracy obtained using leave-one-out cross-validation was estimated at 70.4% ± 18.4% on average.

data available on figshare

analysis available on github
  • 30 subjects
  • 8 sources, 8 detectors, 20 physiological channels
  • Dataset is focused on BCI (classification of unilateral finger vs. foot tapping)
  • Uses same BBCI analysis pipeline & similar data format as EEG+fNIRS BCI dataset
  • More recent (2019)
  • Data is in .mat format & tutorials are in .m. 

Motor execution and imagery fNIRS data

Single subject classification of executed movements. May be poor quality?
  • 1 subject
  • documentation could be better
  • relies on SPM package for analysis
  • Data is in .mat format & tutorials are in .m. 

NirsAutoML, an automated classification platform

This project aims to enable the use of the sktime toolbox for the classification of fNIRS data. This tool is accompanied by a manual that presents information on how to use the tool and how to set up sktime. The tool and the manual were tested with potential users and recommendations were recorded for potential future improvements.
  • ? subjects
  • Unusual data format (part of a BSci degree project at Worcester Polytechnic Inst)
  • Good documentation but some dead github links
  • Probably better if grabbing tools based on sktime.

sktime (used by NirsAutoML)

sktime is a Python machine learning toolbox for time series with a unified interface for multiple learning tasks. We currently support: Forecasting,Time series classification,Time series regression.

sktime provides dedicated time series algorithms and scikit-learn compatible tools for building, tuning, and evaluating composite models.

For deep learning methods, see our companion package: sktime-dl.

snirf-samples (sample data from SNIRF File Format)

  • Not a "real" dataset
  • SNIRF is probably the most commonly used fNIRS datatype
  • Highly standardized

Open Access Multimodal fNIRS Resting State Dataset With and Without Synthetic Hemodynamic Responses

dataset available on NITRC
  • 14 participants (14 for 5 min resting state + 14 for 10 min resting)
  • Published "for the data science community [...] to validate novel methods"
  • 32 sources, 32 detectors, ? physiological channels
  • Data is not a "BCI" dataset
  • requires login to download (free but wget doesn't work)

fNIRS data recorded during observed and executed hand actions

Download link to .zip (from NITRC)
  • no accompanying paper
  • ? participants
  • ? sources, ? detectors, ? physiological channels
  • requires login to download (free but wget doesn't work)

An Online Database of Infant Functional Near InfraRed Spectroscopy Studies: A Community-Augmented Systematic Review

  • Database appears to be unsupported or missing?

Data standards that link to data


Neurodata Without Borders (nwb.org)


"Neurodata Without Borders: Neurophysiology (NWB:N) is a data standard for neurophysiology, providing neuroscientists with a common standard to share, archive, use, and build analysis tools for neurophysiology data. NWB:N is designed to store a variety of neurophysiology data, including data from intracellular and extracellular electrophysiology experiments, data from optical physiology experiments, and tracking and stimulus data."
  • Hosted by the Kavli Foundation -

Pros
  1. Curation is recent
  2. Datasets are very high quality, many from flagship labs in the field (Rutishauser, Allen Institute, Buzsáki, Churchland,...)
  3. Matlab & Python ready

Cons
  1. Fewer datasets than other repositories

Potential starting points

  1. Extracellular Electrophysiology Tutorial
  2. A NWB-based dataset and processing pipeline of human single-neuron activity during a declarative memory task Nature Scientific Data

Associated repositories

Details about contributors are available on NWB GitHub Organization and the different GitHub repositories: NWB Schema, PyNWB, MatNWB, HDMF among many others.


BIDS & OpenNeuro


Free and open platform for sharing MRI, MEG, EEG, iEEG, and ECoG data

First described in Nature Scientific Data

Pros
  1. 427 public datasets
  2. Many recording methods, mostly in human (MRI, MEG, EEG, iEEG, ECoG)
  3. Highly active

Cons
  1. Missing extracellular ephys data & other techniques


SNIRF


Find SNIRF info on the SfNIRS website and software on github.


OpenBCI Searchable Database 


google sheet

Find OpenBCI software on github.


Neuro Databases 


The DANDI Archive


dandiarchive.org
DANDI is a platform for publishing, sharing, and processing neurophysiology data funded by the BRAIN Initiative.The archive is available using the Data Portal. For instructions on how to interact with the archive click here.

Pros
  1. Curation is recent
  2. Datasets are in NWB format
  3. 37 datasets in 5 species (956 subjects) as of September 21, 2020

Cons
  1. Dandi is currently in early access, some features may be limited

Potential starting points

The DANDI archive includes the recogmem dataset (see above:  A NWB-based dataset and processing pipeline of human single-neuron activity during a declarative memory task Nature Scientific Data).

An introduction to these data can be found using the Tutorial: Reading NWB data in Python and Matlab 


The DANDI archive also includes an ECoG dataset from the E. Chang lab at UCSF, perhaps the leading experts in speech decoding BMIs.

short description: High-density 256-channel electrocorticography (ECoG) array implanted in human patients during treatment for epilepsy. The subjects are reading aloud consonant-vowel syllables from a list. The data was collected by Dr. Edward Chang and Dr. Kristofer Bouchard at the University of California, San Francisco, and curated by Dr. Kristofer Bouchard and Dr. Benjamin Dichter (NWB community liaison & founder of catalyst. Each file is a continuous recording session in Neurodata Without Borders (NWB) 2.0 format.

International Epilepsy Electrophysiology Database (IEEG.org)


IEEG.ORG is a collaborative initiative funded by the National Institutes of Neurological Disorders and Stroke (NIH NINDS). This initiative seeks to advance research towards the understanding of epilepsy by providing a platform for sharing data, tools and expertise between researchers. The portal includes a large database of scientific data and tools to analyze these datasets. (United States National Institutes of Health Grant # 1 U24 NS063930-01)

EEG, LFP, μECOG + metadata, imaging, annotations on data Humans and animal models of epilepsy Non-healthy, several healthy

note: appears some maintenance is occuring (1 author only) in 2022.

Pros
  1. Appears to have many datasets (819 public?)

Cons
  1. Website and data appear older (2010) and unintuitive to use
  2. Matlab only


Allen Brain Atlas

Allen Brain Atlas
The Visual Coding – Neuropixels project uses high-density extracellular electrophysiology probes to record spikes from a wide variety of regions in the mouse brain. Our experiments are designed to study the activity of the visual cortex and thalamus in the context of passive visual stimulation, but these data can be used to address a wide variety of topics.

Pros
  1. Available directly through the AllenSDK, but also through NWB.
  2. Highly documented

Cons
  1. Mouse only
  2. Heavy emphasis on neuropixel technology, may not translate?

Tuberous Sclerosis Complex Autism Center of Excellence Network


The Tuberous Sclerosis Complex Autism Center of Excellence Network (TACERN) is a group of five premier children's hospitals located throughout the US.

These data do not appear to be readily available to the public.


Neuroimaging Tools & Resources Collaboratory


Neuroimaging Tools & Resources Collaboratory
MR, EEG, MEG, PET/SPECT, Genomics, CT, Optical (including fNIRS), ECoG