Population diversity of ORFan genes in Escherichia coli.

Printer-friendly versionPrinter-friendly versionPDF versionPDF version
TitlePopulation diversity of ORFan genes in Escherichia coli.
Publication TypeJournal Article
Year of Publication2012
AuthorsYu, G, Stoltzfus, A
JournalGenome Biol Evol
Date Published2012
KeywordsEscherichia coli, Evolution, Molecular, Genes, Bacterial, Genetic Speciation, Genetic Variation, Genome, Bacterial, Open Reading Frames, Phylogeny, Pseudogenes, Shigella

The origin and evolution of "ORFans" (suspected genes without known relatives) remain unclear. Here, we take advantage of a unique opportunity to examine the population diversity of thousands of ORFans, based on a collection of 35 complete genomes of isolates of Escherichia coli and Shigella (which is included phylogenetically within E. coli). As expected from previous studies, ORFans are shorter and AT-richer in sequence than non-ORFans. We find that ORFans often are very narrowly distributed: the most common pattern is for an ORFan to be found in only one genome. We compared within-species population diversity of ORFan genes with those of two control groups of non-ORFan genes. Patterns of population variation suggest that most ORFans are not artifacts, but encode real genes whose protein-coding capacity is conserved, reflecting selection against nonsynonymous mutations. Nevertheless, nonsynonymous nucleotide diversity is higher than for non-ORFans, whereas synonymous diversity is roughly the same. In particular, there is a several-fold excess of ORFans in the highest decile of diversity relative to controls, which might be due to weaker purifying selection, positive selection, or a subclass of ORFans that are decaying.

Alternate JournalGenome Biol Evol
PubMed ID23034216
PubMed Central IDPMC3514957
Grant ListGM081511 / GM / NIGMS NIH HHS / United States