Headed by researchers from the Wellcome Trust Sanger Institute and the Broad Institute, the 1000 Genomes Project has announced results from several pilot studies in a paper published in Nature on Oct. 28. The report described the use of advanced technology for sequencing more than 1000 human genomes from 27 populations worldwide.
Participants say that the project will sequence approximately 2500 human genomes, and should be completed by 2012. The complete findings will be made publically available, free of charge.
Typically, sequencing a genome involves separating DNA into fragments (each several hundreds of base pairs), and running the fragments through a machine, which synthesizes new copies, and by doing so, reads them. The segments are then pieced together within the genome, using various mathematical methods.
The 1000 Genomes project, however, aims to makes use of technological advancements that have improved sequencing accuracy while cutting costs.
Because of the large number of genomes that the project aims to sequence, error is inevitable. The researchers devote less time to each individual genome and only “read” each one about four times. Repeating the process fewer times cuts costs, but increases the chance of errors.
The University’s role in the project consisted of developing ways to minimize error through advanced statistical approaches.
“The researchers at Cornell have experience working with DNA sequences in this context, where there are statistical uncertainties, so we contributed by developing robust statistical methods,” said co-principal project investigator and a member of the project steering committee, Prof. Andrew Clark, molecular biology and genetics.
The university’s role also included conducting “analysis of the population genetics of the samples, such as the distribution of frequencies of variants, quantifying population differentiation and [identifying] the age of mutations, [which] are likely to play a role in genetic disease risk.”
Prof. Alon Keinan, biological statistics and computational biology, is taking part in the project as well, and is trying “to understand how genetic drift has shaped our genome and bring this understanding to medical genetic studies,” he said.
The paper published in Nature outlined three pilot studies that tested different approaches to cataloging genetic variation.
The first pilot study sequenced the genomes of 179 individuals from European, African and East Asian populations. The study used “low coverage,” which means an average of four reads for each base pair. The low coverage approach, despite its potential to produce error, actually proved to be an effective way to discover common genetic variants between individuals, according to Clark.
In contrast to the first, the second study involved sequencing two families – two mothers, two fathers, and two daughters – using “high coverage” (reading each sequence 20 to 60 times). Using high coverage reading for the entire project, however, proved to be too costly to use.
The third study was more specific, and focused on the sequencing of exons — the protein-coding functional parts of genes — of 700 individuals to augment the data for functionally important parts of the genome. The exons were read using high coverage, which researchers believe is the most effective way to read exons.
When completed, the project will provide new, valuable insight into the world of human genomics. Researchers are hopeful that their findings will change what is known about human diseases and improve the ways many of them are treated.
“We are sequencing over 2000 people from all across the world with one of the goals being to find genetic variances that are pretty rare,” Keinan said. “By using these rare findings, we have more of a chance to figure out what’s going on with common diseases and how we can cure or prevent them.”
Original Author: Maria Minsker