Method: Bioinformatics programs were evaluated for the following stages of data analysis: 1) sequence quality filtering; 2) pair-ending sequence merging; 3) operational taxonomic unit (OTU) calling; 4) classification; and 5) diversity and richness estimation.
Result: The human oral microbiome has been well-characterized and full length 16S rRNA gene sequences of most abundant species are available as references. Based on the evaluation of the current available software and pipeline, we propose the use of two-stage, open-ended reference-base OTU calling pipeline: 1) reference-based OTU calling using HOMD 16S rRNA references and taxonomy inferred from the HOMD taxonomy; 2) de-novo OTU calling of the reads not mapped in stage 1 and taxonomy inferred from non-HOMD references, such as GreenGene or Silva databases.
Conclusion: The proposed two-stage approach for NGS 16S rRNA data is comprehensive in mapping the reads to known human oral taxa as well as in discovering novel taxa in the oral microbial samples.