Researchers from the Institute of Neurosciences of the University of Barcelona (UBneuro) have published a new study in the Proceedings of the 20th Machine Learning in Computational Biology meeting (MLCB 2025), one of the leading international conferences at the intersection of machine learning and computational biology. The work applies advanced artificial intelligence approaches to better understand why Huntington’s disease can begin at very different ages in different patients.
Huntington’s disease is a hereditary neurodegenerative disorder caused by an expansion of a DNA sequence known as a CAG repeat in the HTT gene. Although the length of this expansion is known to influence the age at which symptoms appear, it does not fully explain the large variability observed among patients. This suggests that additional genetic factors play an important role in determining when the disease begins.
In this study, the researchers used non-linear machine learning models, such as tree-based models and graph neural networks (GNNs), to identify genetic modifiers — genes that can delay or accelerate disease onset depending on a patient’s genetic background. Unlike traditional statistical approaches, these models can capture complex interactions between genes and reveal effects that depend on the length of the CAG expansion itself.
To make the analysis more efficient and interpretable, the researchers also developed a method to compress genetic information using gene-specific neural networks, reducing computational cost without losing predictive power. In addition, they incorporated predicted gene expression changes generated by a state-of-the-art genomic language model, allowing them to link regulatory DNA variants to changes in gene activity in brain regions affected by the disease.
By analyzing genetic data from more than 9,000 Huntington’s disease patients, the team identified both previously known modifiers related to DNA repair and new candidate genes involved in processes such as transcription regulation and cellular metabolism. Importantly, the results show that different biological mechanisms may influence disease onset in patients with shorter versus longer CAG expansions, highlighting the context-dependent nature of these genetic effects for the first time.
“What this work shows is that the genetic factors modifying Huntington’s disease are not universal, but highly context-dependent. By using non-linear and multimodal machine learning, we can uncover interactions that were essentially invisible to traditional approaches.” remarks Jordi Abante, principal investigator of the study.
Overall, this work provides a new framework for studying complex genetic diseases and demonstrates how machine learning can help uncover biologically meaningful patterns that are difficult to detect with conventional methods. The authors suggest that this approach could also be applied to other neurodegenerative and inherited disorders, opening new avenues for research and, in the future, more personalized therapeutic strategies.