“Taking a hint from the text comparison methods used to detect plagiarism in books, college papers and computer programs, University of California, Berkeley, researchers have developed an improved method for comparing whole genome sequences. With nearly a thousand genomes partly or fully sequenced, scientists are jumping on comparative genomics as a way to construct evolutionary trees, trace disease susceptibility in populations, and even track down people’s ancestry.

To date, the most common techniques have relied on comparing a limited number of highly conserved genes – no more than a couple dozen – in organisms that have all these genes in common. The new method can be used to compare even distantly related organisms or organisms with genomes of vastly different sizes and diversity, and can compare the entire genome, not just a selected small fraction of the gene-containing portion known to code for proteins, which in the human genome is only 1 percent of the DNA.

The technique produces groupings of organisms largely consistent with current groupings, but with some interesting discrepancies, according to Sung-Hou Kim, professor of chemistry at UC Berkeley and faculty researcher at Lawrence Berkeley National Laboratory. However, the relative positions of the groups in the family tree – that is, how recently these groups evolved – are quite different from those based on conventional gene alignment methods.The computational results have surprised scientists in being able to classify some bacteria and viruses that until now were enigmatic. The technique, which employs feature frequency profiles (FFP), is described in a paper to appear this week in the early online edition of the journal Proceedings of the National Academy of Sciences.”

(via UC Berkley News. Thanks Josh!)