Reads generated by Illumina wgs sequencing were analyzed by MetaPhlAn, which infers a table of relative abundances for all taxonomic levels (from phyla to species) for bacteria and archaea. The MetaPhlAn classifier compares each read to a pre-computed catalog of unique clade-specific markers in order to identify high-confidence matches. This is very computationally efficient, as the catalog contains only ~4% of sequenced microbial genes. Apart from standard quality control, no other metagenomic pre-processing steps (eg. error detection, assembly, or gene annotation) are required. The classifier normalizes the total number of reads in each clade by the nucleotide length of its markers and provides the relative abundance of each taxonomic unit. Microbial reads belonging to clades with no sequenced genomes available are reported as an "unclassified" subclade of the closest ancestor with available sequence data.
If you're interested in joint analysis of 16S and shotgun metagenomic datasets from the HMP, pairing up data from the same microbiome samples can initially seem tricky. The HMP Sample Flow Schematic indicates how these sample IDs are related experimentally, and provides tables joining 16S dataset "SN" and "PSN" identifiers with metagenomic dataset "SRS" identifiers.
Protocols and Tools