DNA barcodes combined with multi-locus data of representative taxa can generate reliable higher-level phylogenies

Published Date: 
May 22, 2021

Taxa are frequently labeled incertae sedis when their placement is debated at ranks above the species level, such as their subgeneric, generic or subtribal placement. This is a pervasive problem in groups with complex systematics due to difficulties in identifying suitable synapomorphies. In this study, we propose combining DNA barcodes with a multi-locus backbone phylogeny in order to assign taxa to genus or other higher-level categories. This sampling strategy generates molecular matrices containing large amounts of missing data that are not distributed randomly: barcodes are sampled for all representatives, and additional markers are sampled only for a small percentage. We investigate the effects of the degree and randomness of missing data on phylogenetic accuracy using simulations for up to 100 markers in 1000-tips trees, as well as a real case: the subtribe Polyommatina (Lepidoptera: Lycaenidae), a large group including numerous species with unresolved taxonomy. Our simulation tests show that when a strategic and representative selection of species for higher-level categories has been made for multi-gene sequencing (approximately one per simulated genus), the addition of this multi-gene backbone DNA data for as few as 5-10% of the specimens in the total dataset can produce high-quality phylogenies, comparable to those resulting from 100% multi-gene sampling. In contrast, trees based exclusively on barcodes performed poorly. This approach was applied to a 1365-specimen dataset of Polyommatina (including ca. 80% of described species), with nearly 8% of representative species included in the multi-gene backbone and the remaining 92% included only by mitochondrial COI barcodes, a phylogeny was generated that highlighted potential misplacements, unrecognized major clades, and placement for insertae sedis taxa. We use this information to make systematic rearrangements within Polyommatina, and to describe two new genera. Finally, we propose a systematic workflow to assess higher-level taxonomy in hyperdiverse groups. This research identifies an additional, enhanced value of DNA barcodes for improvements in higher-level systematics using large datasets.

Gerard Talavera, Vladimir Lukhtanov, Naomi E Pierce, Roger Vila