Research from the labs of Profs. Andrew Clark and Daniel Barbash in the Department of Molecular Biology and Genetics at Cornell paint a dynamic picture of the evolution of tandem repetitive DNA, which is a poorly understood part of our genomes.
DNA is the code inside each of our cells that carefully orchestrates its function and behavior. According to Clark, in large genome sequencing projects (where the entire code is parsed), a critical step lies in trying to assemble this code in the correct order.
However, certain big regions of the genome are really challenging to assemble because of the high amount of code repetitiveness. This makes it hard for the algorithms that assemble them to place them accurately, making our understanding of these regions is limited. But, these regions have important roles in maintaining DNA stability and have clues to understanding a variety of genetic disorders, most notably Huntington’s disease.
“We know that they are important biologically and we also know they are fast evolving from one species to another. From humans to chimpanzees, there is almost a complete rewriting of these regions.” Clark said.
The approach they took to understand these regions did not involve assembly: rather, it involved focusing on the small fragments of the repetitive code. According to Clark, they examined the relative abundance of smaller parts called k-mers, which refer to all the possible subsequences of DNA code of a variable length k, and observed their differences.
Next they wanted to observe if they could actually observe these k-mers change over multiple generations. They analyzed different generations of individual Chlamydomonas reinhardtii, a unicellular alga, whose genomes had already been sequenced, to study the rates and patterns of the mutations of k-mers.
Clark talked about about how labs’ findings contributed towards the field of genome research.
“It’s part of a series of many papers accumulating information about the way these repetitive regions evolve,” Clark said, recounting the other organisms and systems his group analyzed.
The idea behind efforts in multiple systems was to see if the evolutionary behavior in acquiring changes by these repetitive DNA regions was universal. In other words, the researchers were trying to find out if k-mers were evolving in a similar pattern across different species. “And so the answer is yes! They are behaving more or less the same way, quantitatively,” Clark said.
According to Clark, the entire project has been done by re-analysing genomic data generated for other projects by other groups. In this era of rapidly growing genomics technologies, not only is performing high-throughput sequencing experiments becoming cheaper, but there is public and free access to most of this data.
“It really does level the playing field quite a bit. For people who do analysis, the standard is trying to make the sequence data as available as possible is an important goal and in many cases we are achieving that and putting the data completely in the public domain and anybody can get the data and repeat it,” Clark said.
“This is a goal we are striving to achieve in my lab and many others.”