Back to Headlines
Science AI Analysis

Population-scale repeat expansions elucidate disease risk and brain atrophy | Nature

AI
AI Legal Analyst
April 8, 2026, 5:36 PM 7 min read 4 views

Summary

Here we performed a population-scale survey of pathogenic repeat expansions by analysing repeat length in 37 disease-associated STR loci in a diverse set of 1,020,833 samples using short-read sequencing whole-exome and whole-genome data. We replicated expected repeat–trait associations, identified a gradual increase in disease risk and penetrance associated with increased repeat length for a range of repeat associated diseases, and identified a significant loss of brain volume in carriers of pathogenic expansions before to disease diagnosis. Table 1 List of 42 disease-associated repeat loci genotyped from WES or WGS samples Full size table We next calculated the frequency of individuals that carry repeats above the reported premutation (also referred to in the literature as reduced penetrance or pre-risk alleles) and pathogenic thresholds in each of these loci in the full cohort, as well as within five groups defined by genetic ancestry (Fig. 1a (middle left)). Prevalence of pathogenic repeat expansions We generated the distribution of repeat length for each of the 37 loci with reliable calls from either WES or WGS data or both (Table 1 and Supplementary Table 1 ) and calculated the frequency of repeat carriers in premutation and pathogenic ranges (Fig. 1b and Supplementary Figs. 1 – 37 ).

## Summary
Here we performed a population-scale survey of pathogenic repeat expansions by analysing repeat length in 37 disease-associated STR loci in a diverse set of 1,020,833 samples using short-read sequencing whole-exome and whole-genome data. We replicated expected repeat–trait associations, identified a gradual increase in disease risk and penetrance associated with increased repeat length for a range of repeat associated diseases, and identified a significant loss of brain volume in carriers of pathogenic expansions before to disease diagnosis. Table 1 List of 42 disease-associated repeat loci genotyped from WES or WGS samples Full size table We next calculated the frequency of individuals that carry repeats above the reported premutation (also referred to in the literature as reduced penetrance or pre-risk alleles) and pathogenic thresholds in each of these loci in the full cohort, as well as within five groups defined by genetic ancestry (Fig. 1a (middle left)). Prevalence of pathogenic repeat expansions We generated the distribution of repeat length for each of the 37 loci with reliable calls from either WES or WGS data or both (Table 1 and Supplementary Table 1 ) and calculated the frequency of repeat carriers in premutation and pathogenic ranges (Fig. 1b and Supplementary Figs. 1 – 37 ).

## Article Content
Download PDF
Subjects
DNA sequencing
Genetic predisposition to disease
Microsatellite instability
Neurodegeneration
Structural variation
Abstract
Pathogenic expansions of short tandem repeats (STRs) cause over 70 neurological diseases
1
,
2
,
3
. Here we performed a population-scale survey of pathogenic repeat expansions by analysing repeat length in 37 disease-associated STR loci in a diverse set of 1,020,833 samples using short-read sequencing whole-exome and whole-genome data. Consistent with previous findings, we found that the frequency of pathogenic repeats is higher than the prevalence of corresponding diseases for most loci
4
,
5
. Associations of repeat length with 7,671 binary traits captured known locus–trait associations, including
HTT
and Huntington’s disease,
DMPK
and myotonic disorders and
C9orf72
and motor neuron disease, among others. Finally, we found that, even before disease diagnosis, repeat expansions in several loci strongly associate with increased levels of neurofilament light chain (NfL) and a loss of brain volume in specific disease-associated regions. For example, carriers of
HTT
expansions exhibited a 22.1% loss of putamen volume, and carriers of
CACNA1A
expansions showed a 24.6% loss of cerebellar volume. These observations suggest that both decreased brain volumes and increased NfL levels occur earlier than disease diagnosis. This study demonstrates the use of characterizing repeat expansions from short-read sequencing data in diverse population-scale cohorts and its application to epidemiology and clinical biomarker development.
Main
STRs are an important class of genetic variants that can lead to many neurodegenerative and neuromuscular diseases, including Huntington’s disease (HD), motor neuron disease (MND), spinocerebellar ataxias (SCAs), myotonic dystrophies 1 and 2 (DM1 and DM2) and spinal and bulbar muscular atrophy (SBMA)
6
,
7
,
8
,
9
. The increase in expansion size that occurs during transmission across generations or somatically, along with changes in repeat motif composition, have been shown to affect disease risk and penetrance, severity, progression and age of onset
10
,
11
,
12
,
13
. Since the discovery of the first association between pathogenic repeats in
FMR1
and fragile-X syndrome in 1991, more than 70 neurological diseases have now been associated with repeat expansions
1
,
2
,
14
. Repeat-associated diseases can manifest through various mechanisms including toxicity through accumulation of RNA (DM1), misfolded proteins (SCA2), post-translational modifications (SCA1) and transcriptional repression due to hypermethylation (fragile-X syndrome)
15
,
16
. Although each of these diseases is rare, their collective societal impact is disproportionately higher due to the high cost associated with treatment, caregiving and loss of income, highlighting the need to understand them better to enable and support drug discovery
17
.
Although prevalence estimates among specific countries and populations are available for some repeat-expansion diseases (for example, HD and SCAs)
18
,
19
, the frequency and penetrance estimates of most STRs suffer from biases stemming from ascertainment, rarity, awareness of disease, access to robust healthcare and the willingness of patients to participate in genetic testing
4
,
5
,
17
. In fact, two of the largest population studies to use PCR to estimate repeat lengths in
DMPK
(among over 50,000 individuals) and
HTT
(among over 7,000 individuals) estimated the frequency of repeat expansion carriers to be higher than previously reported
4
,
20
. These observations illustrate the need for population-scale studies that take a genotype-first approach to examine the frequency of expanded repeats, risk associated with increased repeat length and the variability of the motifs in the repeat loci. Until recently, studies focusing on repeat-expansion-associated diseases have been mostly disease specific or locus specific, with sample sizes constrained by the rarity of disease and the prohibitive cost of assays to estimate repeat length
21
,
22
,
23
. However, population-scale analyses that can call repeats at multiple loci simultaneously and provide insights into the prevalence of potentially pathogenic expansions are now emerging due to the wide availability of biobank-scale whole-exome sequencing (WES) and whole-genome sequencing (WGS) data as well as dedicated computational methods
3
,
5
,
24
,
25
,
26
,
27
.
Here we describe a population-scale survey of repeat expansions in 1,020,833 samples from 7 diverse cohorts. We estimated repeat size in 37 disease-associated loci, calculated carrier frequencies of pathogenic repeats and compared them across ancestry groups and sequencing platforms. We show that the frequencies of pathogenic expansions in many loci exceed the reported prevalence of the corresponding diseases, vary among ancestry groups and correspond to differences in disease prevalence between subpopulations reported in epidemiological

---

## Expert Analysis

### Merits
- Main STRs are an important class of genetic variants that can lead to many neurodegenerative and neuromuscular diseases, including Huntington’s disease (HD), motor neuron disease (MND), spinocerebellar ataxias (SCAs), myotonic dystrophies 1 and 2 (DM1 and DM2) and spinal and bulbar muscular atrophy (SBMA) 6 , 7 , 8 , 9 .
- Although prevalence estimates among specific countries and populations are available for some repeat-expansion diseases (for example, HD and SCAs) 18 , 19 , the frequency and penetrance estimates of most STRs suffer from biases stemming from ascertainment, rarity, awareness of disease, access to robust healthcare and the willingness of patients to participate in genetic testing 4 , 5 , 17 .
- We replicated expected repeat–trait associations, identified a gradual increase in disease risk and penetrance associated with increased repeat length for a range of repeat associated diseases, and identified a significant loss of brain volume in carriers of pathogenic expansions before to disease diagnosis.
- The black dotted line represents the x = y line. b , Ancestry differences ( x axis) of premutation (yellow) and pathogenic (red) repeat carrier frequencies ( y axis) for 14 loci with statistically significant enrichments/depletions (dark colours and asterisks) supported by at least five samples and a P < 1 × 10 − 6 from two-tailed Fisher’s exact tests ( P values were not adjusted for multiple testing).

### Areas for Consideration
- The increase in expansion size that occurs during transmission across generations or somatically, along with changes in repeat motif composition, have been shown to affect disease risk and penetrance, severity, progression and age of onset 10 , 11 , 12 , 13 .
- These observations illustrate the need for population-scale studies that take a genotype-first approach to examine the frequency of expanded repeats, risk associated with increased repeat length and the variability of the motifs in the repeat loci.
- We replicated expected repeat–trait associations, identified a gradual increase in disease risk and penetrance associated with increased repeat length for a range of repeat associated diseases, and identified a significant loss of brain volume in carriers of pathogenic expansions before to disease diagnosis.

### Implications
- Main STRs are an important class of genetic variants that can lead to many neurodegenerative and neuromuscular diseases, including Huntington’s disease (HD), motor neuron disease (MND), spinocerebellar ataxias (SCAs), myotonic dystrophies 1 and 2 (DM1 and DM2) and spinal and bulbar muscular atrophy (SBMA) 6 , 7 , 8 , 9 .
- The increase in expansion size that occurs during transmission across generations or somatically, along with changes in repeat motif composition, have been shown to affect disease risk and penetrance, severity, progression and age of onset 10 , 11 , 12 , 13 .
- Although each of these diseases is rare, their collective societal impact is disproportionately higher due to the high cost associated with treatment, caregiving and loss of income, highlighting the need to understand them better to enable and support drug discovery 17 .
- Furthermore, in contrast to ExpansionHunter, GangSTR could not model complex repeat motifs such as ‘GCN’ (Supplementary Information 2 ).

### Expert Commentary
This article covers repeat, loci, pathogenic topics. Notable strengths include discussion of repeat. Areas of concern are also raised. Readability: Flesch-Kincaid grade 0.0. Word count: 2301.
repeat loci pathogenic disease repeats expansions associated wes

Related Articles