NIH Human Microbiome Project 2

MBQC - baseline final data products

Here, you can find the main integrated data from the MBQC-baseline. Additional pages also provide the raw deposited sequences and raw bioinformatics products. Note that all data products included here and below have been blinded to anonymize the handling labs (abbreviated HLs) and bioinformatics labs (BLs) who participated in the MBQC-baseline.

  • Integrated OTU table. This includes all ~16,500 samples and OTUs that were deposited by any bioinformatics lab in appropriate format, in addition to metadata describing the sample's originating biospecimen, bioinformatics lab, and handling lab.
  • Specimen list. The MBQC-baseline included 22 specimens (plus negative controls) of four types: fresh and freeze-dried human stool, chemostat aliquots, and fecal oral artificial communities (as positive controls).
  • Sample set aliquots. From these originating specimens, aliquots were generated and assembled into standardized 96-sample sets for distribution to handling participants. This table lists the first stage blinded identifiers, specimen, and aliquot information for all sample set contents.
  • Sample handling protocols. While labs could choose their own data generation protocols, as long as they resulted in demultiplexed Illumina 16S amplicon sequences, a detailed form systematically recorded protocol variables.
  • Bioinformatics protocols. Likewise, while labs could choose any bioinformatics protocol that resulted in a standardized OTU table, the MBQC-baseline systematically recorded protocol variables in this table.
  • Bioinformatics distribution blinding. A second internal blinding was used to hide the originating handling laboratory for each raw data file distributed prior to bioinformatics processing. This table links the handling lab ID and original specimen ID to this internally blinded random identifier (not used for final data integration, but useful with the blinded bioinformatics distribution below).
  • Mock community composition. The microbial strains and approximate quantities (by loop count) used in constructing the fecal and oral derived artificial communities.