Washington, DC - A team of researchers from the National Library of Medicine (NLM), part of the National Institutes of Health, and collaborating academic research institutions developed a method to measure a type of gene mutation involved in the evolution of cancer. This type of mutation, called “repeat instability,” may be useful in early cancer diagnosis. Findings were published this week in the Proceedings of the National Academy of Sciences.

Cancer is primarily caused by mutations in certain genes. The most thoroughly studied cancer-associated mutations involve the substitution of one nucleotide of DNA for another in genes known as oncogenes and tumor suppressors.

In this study, the researchers identified a different type of mutation active in cancer, one that increases and/or decreases repetitive segments of DNA and protein sequences in various genes. These changes are collectively named “repeat instability.”

Researchers developed a computational methodology to quantify variations in the repeat content of gene and protein sequences. They analyzed sequence data from 325 patients with a variety of cancers, including breast, prostate, bladder, and lung, as well as individual patients with metastases. Using computational biology techniques, the researchers compared the sequences from the cancer tissue with those from healthy tissue adjacent to the cancer site and to blood, which served as the control.

“This study shows that repetitive sequences, which are ‘hotspots’ of DNA evolution, emerge early in tumor evolution but fade away in later phases, particularly during the transition to metastatic states, though they leave clear marks in the genome,” said Eugene Koonin, Ph.D., a co-author of the study and head of NLM’s Evolutionary Genomics Research Group.

“The study found that non-cancerous tissue adjacent to tumors had patterns of repetitive sequences that were similar to those detected in tumors,” said NLM’s Erez Persi, Ph.D., lead author on the paper. “This fact, and the reduction in repeat sequences seen once the cancers metastasized, suggests the potential for using repeat sequences in the early diagnosis of cancer.”

In addition to NIH researchers, the study involved researchers at the University of Tel Aviv, Israel, the University of Trento, Italy, and Weill Cornell Medicine, New York City. 

“Collaborative research such as this, where our computational biologists and data scientists work together with others on discoveries, could make a difference in life-threatening diseases such as cancer,” said NLM Director Patricia Flatley Brennan, R.N., Ph.D.

This press release describes a basic research finding. Basic research increases our understanding of human behavior and biology, which is foundational to advancing new and better ways to prevent, diagnose, and treat disease. Science is an unpredictable and incremental process — each research advance builds on past discoveries, often in unexpected ways. Most clinical advances would not be possible without the knowledge of fundamental basic research. 

NLM, part of the NIH, is a leader in research in biomedical informatics and data science, and the world’s largest biomedical library. NLM conducts and supports research in methods for recording, storing, retrieving, preserving, and communicating health information. It creates resources and tools that are used billions of times each year by millions of people to access and analyze molecular biology, biotechnology, toxicology, environmental health, and health services information.