Accurate Prediction of Motifs in Prokaryotic Genomes

Guojun li 教授

山东大学/University of Georgia

We developed an algorithm capable of accurately extracting cis-regulatory motif with correct length without any limitations. It is so reliable that it can even recognize very weak motif signals hidden in multi thousands of promoters from a whole prokaryotic genome. The unprecedented computation ability is attributed to an exquisite combination of several novel strategies of motif strengthening, motif positioning, motif seeding, and an innovative evaluation of statistical significance. The algorithm has been implemented as a computer program BOBRO on both simulated and biological data. We compared the BOBRO program with 5 existing salient motif finders, and found that it always overwhelmingly outperforms the best one in any complicated data situations. More than that, the program has successfully been used to discriminate those global motifs, such as CRP, Fur, FNR, TyrR, Lrp, IHF, etc, from a sequence data consisting of 2390 promoters from the complete genome of E. coli K12. To the best of our knowledge, this is the first tool of such a miraculous capability for motif finding even without any requirements about both of the to-be-identified motifs and the test sequence data. Not only can it be used as a tool to analyze newly sequenced prokaryotic genome for which nothing is known about transcriptional regulation, but it can also be used as a complementary tool for comparative genomics analyses. Furthermore, the BOBRO program is able to extract multi different motifs simultaneously without any extra efforts needed.


Guojun Li is a Senior Research Scientist in Department of Biochemistry and Molecular Biology, University of Georgia, and a full professor in Department of Mathematics, Shandong University of China. He received his Ph.D degree from Institute of Mathematics and System Sciences, Chinese Academy of Science, Beijing, China in 1996. He was trained in mathematics and computer science before 2003. His representatives in mathematics and computer science include proofs of several conjectures from Graph Theory and resolutions of several hard problems from computer science. Since 2004 he has been focusing on the research areas related to Bioinformatics. His favorite of representatives in bioinformatics will be the development of an algorithm that is capable of recognizing cis-regulatory motifs with correct length from a test data consisting of 2390 promoter sequences collected from E.coli K12, and will presented in our colloquium. His current research interests lie in developing algorithms to extract motif at genome scale, and developing bi-clustering method for analysis of gene expression data so as to reconstructing gene regulatory network and doing classification of functional genes. He is also interested in protein structure prediction by threading, especially in large-scale biological data mining. He has published over 70 international journal papers in mathematical, computer science and biology related scientific journals.