Human Microbiome Compendium dataset

  • Richard J. Abdill (UNIVERSITY OF CHICAGO) (Creator)
  • Samantha P. Graham (Creator)
  • Vincent Rubinetti (Creator)
  • Frank W Albert (Creator)
  • Casey S. Greene (Creator)
  • Sean Davis (Creator)
  • Ran Blekhman (Creator)



The Human Microbiome Compendium is an ongoing project to build a large collection of human microbiome sequencing data processed with a uniform pipeline. Currently, the compendium contains 16S rRNA amplicon sequencing data for human gut microbiome samples retrieved from the Sequence Read Archive. Our website at has more information about the project and links to related resources. This data is freely available under a CC-BY license; if you use it in your work, please cite our preprint, "Integration of 168,000 samples reveals global patterns of the human gut microbiome" (doi: 10.1101/2023.10.11.560955). If you are using this dataset in conjunction with your own results, it's important to note that starting in version 1.0.1, the nomenclature used in this taxonomic table diverges from the output generated by DADA2 and the SILVA database. See the v1.0.1 release notes directly below for details. Version history 1.0.1: The "asv_assignments" table was corrected to fix entries in which the taxonomic levels were incorrectly inferred from the reference database by DADA2 (e.g. genus "Brassicibacter" was listed as a family, genus "Gelria" was listed as an order). The problem is documented in issues attached to repositories for DADA2, DADA2 reference databases, and our MicroBioMap library. In short, problems were noted in v138 of the SILVA database in which taxonomic names were not recorded properly if they were missing levels (e.g. a taxon has been assigned a proposed genus, but not a family). This was addressed in v138.1, which we originally used for generating this dataset. However, several dozen entries remain incorrectly annotated in v138.1—our 1.0.1 release corrects these by filling in the nomenclature gaps with "(unclassified)" and moving the existing data to the correct level. 2881 ASV assignments were affected out of about 4.3 million. The new file "taxa_corrections.tsv" is a copy of the "bad-taxa.csv" list generated by Michael McLaren, with notes added to reflect what we changed. 1.0.0: Added file to the repository, and added a link to the preprint and title/author metadata for the Zenodo entry 0.2.1: "sample_metadata.tsv" was missing (Note: This was accidentally tagged "0.2.0" in the version history.) 0.2.0: Replacing "country" column in sample_metadata.tsv with an "iso" column using the country code rather than name. 0.1.0: Prepping for public release
Date made availableJan 5 2024

Cite this