TY - JOUR
T1 - Structural and genetic diversity in the secreted mucins MUC5AC and MUC5B
AU - Plender, Elizabeth G.
AU - Prodanov, Timofey
AU - Hsieh, Ping Hsun
AU - Nizamis, Evangelos
AU - Harvey, William T.
AU - Sulovari, Arvis
AU - Munson, Katherine M.
AU - Kaufman, Eli J.
AU - O'Neal, Wanda K.
AU - Valdmanis, Paul N.
AU - Marschall, Tobias
AU - Bloom, Jesse D.
AU - Eichler, Evan E.
N1 - Publisher Copyright:
© 2024 The Author(s)
PY - 2024/8/8
Y1 - 2024/8/8
N2 - The secreted mucins MUC5AC and MUC5B are large glycoproteins that play critical defensive roles in pathogen entrapment and mucociliary clearance. Their respective genes contain polymorphic and degenerate protein-coding variable number tandem repeats (VNTRs) that make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5,761–5,762 amino acids [aa]); however, seven haplotypes have expanded VNTRs (6,291–7,019 aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5,249–6,325 aa) with cysteine-rich domain and VNTR copy-number variation. We group MUC5AC alleles into three phylogenetic clades: H1 (46%, ∼5,654 aa), H2 (33%, ∼5,742 aa), and H3 (7%, ∼6,325 aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium and Tajima's D analyses reveal that East Asians carry exceptionally large blocks with an excess of rare variation (p < 0.05) at MUC5AC. To validate this result, we use Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observe a signature of positive selection in H1 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium (p < 0.05), consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein-coding VNTRs for improved disease associations.
AB - The secreted mucins MUC5AC and MUC5B are large glycoproteins that play critical defensive roles in pathogen entrapment and mucociliary clearance. Their respective genes contain polymorphic and degenerate protein-coding variable number tandem repeats (VNTRs) that make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5,761–5,762 amino acids [aa]); however, seven haplotypes have expanded VNTRs (6,291–7,019 aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5,249–6,325 aa) with cysteine-rich domain and VNTR copy-number variation. We group MUC5AC alleles into three phylogenetic clades: H1 (46%, ∼5,654 aa), H2 (33%, ∼5,742 aa), and H3 (7%, ∼6,325 aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium and Tajima's D analyses reveal that East Asians carry exceptionally large blocks with an excess of rare variation (p < 0.05) at MUC5AC. To validate this result, we use Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observe a signature of positive selection in H1 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium (p < 0.05), consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein-coding VNTRs for improved disease associations.
UR - http://www.scopus.com/inward/record.url?scp=85198558475&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85198558475&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2024.06.007
DO - 10.1016/j.ajhg.2024.06.007
M3 - Article
C2 - 38991590
AN - SCOPUS:85198558475
SN - 0002-9297
VL - 111
SP - 1700
EP - 1716
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
IS - 8
ER -