Metagenomics Sequence Analysis
If you're interested in joint analysis of 16S and
shotgun metagenomic datasets from the HMP, pairing up data from the same
microbiome samples can initially seem tricky. The following tables join 16S
dataset "SN" and "PSN" identifiers with metagenomic dataset "SRS" identifiers.
The diagram below indicates how these sample IDs are related
The DACC provides mapping files organized by 16S variable region (V),
containing metadata and linking together all levels of ids:
Sample flow of the clinical phases is described in the following schematic:
Go back to Data Browser
- (1) Samples were collected at 2 dedicated sampling centers (see Sample
collection for more detail), with each human participant being sampled
at 15 or 18 body sites over one to three visits.
- (2) All subject & sample metadata was submitted to EMMES, our
partner in metadata storage and management. EMMES assigned each human
subject a de-identified RSID (random subject identifier).
- (3) Every individual sample was assigned a PSN id, an EMMES-generated
Primary Sample Number.
- (4) DNA was extracted from the primary samples using methods described in
the Manual of
Procedures, and assigned a NAP id (Nucleic Acid Preparation) generated by
the sequencing centers. Extractions were divided into aliquots as necessary.
Each aliquot was assigned an SN id, an EMMES-generated Sample Number. For the
clinical pilot phase, every primary sample was sent to two sequencing centers
to assess reproducibility of technical replicates across multiple institutions.
Therefore there is a one-to-many relationship between PSN and SN ids. In the
clinical pilot phases, a single PSN sample generated a single SN sample, which
was sent to a single sequencing center, allowing for a one-to-one relationship
between PSN and SN.
- (5) Sequencing centers performed sequencing on each SN sample, sequencing
16S variable regions 3-5 (V35) from every samples within the study, and
variable regions 1-3 (V13) and variable regions 6-9 (V69) from subsets of
samples. If the SN samples did not provide enough material for sequencing, the
centers were sent the original PSN samples for re-extraction and sequencing.
- (6) All sequence data was submitted by the sequencing centers directly to
NBCI's Sequence Read Archives (SRA) data repository. Centers submitted data
using the RSID subject ids, and the NAP sample ids. Data was then organized at
NCBI into their standardized levels of experiment accession (SRX ids), run id
(SRR ids) and sample id (SRS ids).