Proglycan Nomenclature 2023
Proglycan Nomenclature 2023
Proglycan Nomenclature 2023
Teaser
In the beginning was the word …. but there were no words for N-glycans. Look at the entry in a
relevant database for the structure that we will herein baptize A4A4F6 and ask yourself, how you
could label your Eppendorf vial with the therein suggested options of which good, old IUPAC code is
probably still the most human-friendly one - whereby this an already simplified version:
Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)] GlcNAc
N-glycans are just one group in the huge universe of carbohydrates, but in medical biotechnology
and bio-pharmaceutics, they supremely reign our attention hierarchy. So, we assume that a
comprehensive and logical naming system could be useful and therefore we apply Occam´s razor to
N-glycan nomenclature.
Prolog
The number of already existing abbreviation systems for N-glycans cannot rival that of sand grains
on sea shores. However, it seems to us that none of the various modes of the currently used naming
systems is capable of representing (almost) all N-glycan structures occurring in mammals, insects
and plants with a decent number of characters in an unambiguous manner. The big exceptions are
the machine codes such as GlycoCT [2], which are by no means apprehensible for the human eye
unless somehow translated to visibility [1]. So, quite often researchers give up and just show
structure cartoons. These, however, are unsuitable for oral or written transmission or for labeling of
vials.
The herein introduced system has proven useful for communication with partners for already
many years. The terms MMXF3 and MUXF3 (with or without the superscript number) enjoy
widespread use in the allergy diagnosis community. Our recent experience with 40 isomeric N-
glycans all composed of 5 hexoses, 4 HexNAcs and 1 fucose [3,4] reinforced our conviction that the
“proglycan” system is highly useful. Therefore, we shall not surrender in our fight against the inertia
exerted by beaten tracks.
For frequently occurring “default” structures, various naming systems are in use. However, in our
eyes, none of these is apt to unambiguously describe a large segment of all possible structures in a
systematic, easily understandable manner. A collection of such systems will be shown in the last
chapter of the present treatise.
The beginning
Proglycan
IUPAC Code CFG Code (Vienna style)
Code
GlcNAc-2Man
6M an - 4GlcNAc - 4GlcNAc
GnGn 3
GlcNAc -2Man
Man
6
Man - 4GlcNAc - 4GlcNAc
MGn 3
GlcNAc -2Man
GlcNAc-2Man
6M an - 4GlcNAc - 4GlcNAc
GnM 3
Man
Man
6
Man- 4GlcNAc- 4GlcNAc
MM 3
Man
Concealing the origin of an idea is widespread practice but we shall ignore this habit. During a brief
stay in our lab in Vienna around 1990, Harry Schachter from Toronto taught us the term “GnGn” for
the acceptor substrate of fucosyl transferases [5,6]. Gn stands for GlcNAc and the two Gn-s symbolize
3
the two terminal residues of the biantennary N-glycan. The rest is unambiguously clear. No further
definitions are required.
This substrate often fell prey to hexosaminidases leaving two isomers with just one GlcNAc, which
can easily be named according to now terminal mannose residue(s) represented by an “M”. If two
GlcNAc are removed, we end up with “MM”. However, if only one mannose gets exposed, we need a
rule about the sequence of reading.
RULE 2: Monosaccharides are depicted by one capital letter. Modifications are specified by a
subsequent small letter.
G … glucose Gn … N-acetylglucosamine
M … mannose Na … N-acetylneuraminic acid
A … galactose Ng … N-glycolylneuraminic acid
F … Fucose X ... Xylose
GnA4 A4A4
A3Gn A4A3
MA4 A3A4
GnA3-4 A3-4A4
4
Core-Fucosylation
The core-fucose constitutes a third terminal residue and hence we introduce a third structure
term, “F” for fucose. We could simply write e.g. A4A4F. In mammals, the core fucose is strictly always
in α1,6. If you only work with mammalian samples you may content yourself with this simplification.
However, as insect cells and plants have some relevance - certainly in biotechnology – we must
consider, that here the fucose can sit or sits in the α1,3-position. Therefore, for the sake of clarity, the
superscripts should be used to define the type of core-fucose.
GnGnF6 A4A4F6
A3GnF6 MA4F6
MMF3 MMF3F6
5
Fucose on antennae
Lewis fucoses introduce branching of the antenna. IUPAC nomenclature uses square brackets to
identify a branch. We do something similar. However, the two residues, or chains, that are linked to
the root of the branch are both put in round brackets. The substitution points are defined by
superscripts. So, LeX fucosylation of an A4 antenna turns this term into (A4F3). This A LeA structure
would be (F4A3) because we read the structure counter-clockwise. Nothing wrong. However, we want
to be elegant and as concise as possible. Are we losing any information when omitting the
superscripts? Obviously, not.
Advanced coding:
The terms (AF) and (FA) perfectly describe the terminal structures. In the near future, we may
substitute the terms by “macros”, i.e. Lx for the Lewis X terminus and La for the Lewis A determinant.
Another difficulty is posed by the blood group H α1,2-fucose, which is linked to galactose, which in
turn can be linked β1,3- or β1,4 to GlcNAc. So just putting “F” as the terminal sugar would leave
uncertainty. Therefore – using linear code [7] – we write F2-A4. We – again - can save one character
by omitting the A to arrive at: F2-4 or F2-3. Why not just F4 or F3 ? Because, we must not ignore the
galactose. “F4” would be a Fuc(α1-4)GlcNAc sub-structure, which does not exist.
RULE 3: If more than one terminal residue occurs on one antenna, these residues are put in
brackets. Round brackets are used for all branching except that arising from GlcNAc-
transferase IV and V (see below “Multiantennary glycans”.
RULE 4: Substituents to the β-galactose are linked to this residue by a hyphen. The terms -A4
or -A3 are abbreviated to the superscripts -4 or -3.
(FA)Gn A4(AF)
LaGn A4Lx
A3(FA)F6 A4F2-3
A3LxF6 A4Lh3
Bisecting GlcNAc
Bisected N-glycans are a peculiar feature of IgG and especially brain glycans. To annotate this
residue, we almost complete our circumnavigation of the N-glycan. Hence, the term “bi” is found at
the very end of the abbreviation.
A4GnF6bi
GnGnbi
(AGnFbi)
6
RULE 5: Bisected GlcNAc is indicated by “bi” and is always listed as the last extension term
Sialic acids
Sialic acids are (usually) not linked directly to GlcNAc and therefore the same approach as for alfa-
Gal and blood-group H fucose is chosen. Thus, Neu5Ac(α2-6)Gal(β1-4)GlcNAc(β1,2)Man(α1- shrinks
to Na6-4.
Na3-3, Na6-4 and the probably not existing Na6-3 denote the other options for a sialylated antenna. N-
glycolylneuraminic acid – extremely rare in humans, common to most animals – is abbreviated as
“Ng”.
The green code is a suggested alleviation for cases where exact linkages are not known, or do not
matter.
A4Na6-4 Na6-4Na6-4
Na3-4Ng3-3F6 Na6-4Na6-4F6
< NaNaF >
Na6-4(Na6-A4F3)F6
Na6-4Na6-LxF6
Multi-antennary glycans
A branch leading to three antennae can occur on either arm of an N-glycan. The two antennae
ascending from the same mannose are set in square brackets. The square exclusively and
immediately tells us that this branch is further branched. By that the two basic types of
triantennary glycans are readily told apart.
The proglycan nomenclature reaches its limits here because the terms become lengthy and difficult
to read, but then, what is the alternative? Even the strange tetra-sialylated triantennary glycan in
bovine fetuin can be depicted. Note that the term [(Na3-3Na6)Na6-4] contains – inside the squrare
brackets - round brackets signifying additional branching. As no other branching point is given, the
root residue is GlcNAc.
7
[A4A3]A4F6 A4[A4A4]
[AA]AF A[AA]
Na6-4[Na6-4Na6-4] [Ng3-4Ng3-3]Ng3-4F6
Na[NaNa] [NgNg]Ng
Na3-4[Na3-4Na6-4]
Na[NaNa] Na3-4[(Na6Na3-3)Na6-4]
(major structure of (specialty of fetuin)
fetuin)
RULE 6: Two mannose-rooted antennae are put in square brackets. The order within brackets
follows the “counter-clockwise” RULE 1”.
LacNAc repeats
The primary LacNAc disaccharide Galβ1-4GlcNAcβ1- that is linked to a mannose can be further
elongated by the addition of GlcNAc (in β1-3 linkage to Gal), which usually is followed by the quick
addition of Gal to arrive at another Galβ1-4GlcNAcβ1- , or LacNAc unit.
Galβ-4GlcNAcβ-2Manα- A4
GlcNAcβ-3Galβ-4GlcNAcβ-2Manα- Gn3-4
Galβ-4GlcNAcβ-3Galβ-4GlcNAcβ-2Manα- Ln4
Galβ-4GlcNAcβ-3Galβ-4GlcNAcβ-3Galβ-4GlcNAcβ-2Manα- Ln-Ln4
The large and complex structures N3.7.2B in recombinant human erythropoietin [9], aka EPO
will thus be written as:
[Na3-Ln4Na3-4]Na3-4
MM MU
(M6M3)M2
M3M
aka Man6
(M6M3)M MM2-2-2
aka Man5 another Man5
(M2-6M3)M2 (M6M3)M2-2
(M2-6M2-3)G (M6M3)G-G-G
(M6M3)Gn M3A4
Man5Gn Man4A*
* In Man4A neither the linkage of the terminal mannose nor of the galactose are exactly
determined. In some instances, this term may nevertheless be justified.
MMXF3 MUXF3
MMXF MUXF
(FA)(FA)XF3
MGnX
LaLaXF3
MMF3F6 M(AnF)F3F6
MMFF
Specialties
The human brain contains sizable amounts of glycans with the “HNK-1” (from human natural
killer cells) with sulfated glucuronic acid [11]. Annotating a structure like this requires some form of
linear code and the addition of abbreviations for non-sugar substituents, in this case sulfate. Note
that the hyphen binds the “su” to “Ga”, which in turn is hyphenated to the “4” (or ”3”), which stands
for the regular antennary galactose.
9
The bladder protein uromodulin aka Tamm-Horsefall protein contains glycans with sulfated
GalNAc and the Sda determinant, which harbors a branch on the galactose residue [12]. Based on
the rules for Lewis determinants, we use a round bracket and a superscript hyphen with the linkage
of the galactose residue.
Another peculiar structure is that with a Lewis X determinant in the bisecting position. With the
rules established so-far, even such an exotic item can be named.
To facilitate deciphering of these terms, the abbreviations are also given with colors for the 6-
arm, 3-arm and the extension terms.
Ga3-4GnF6bi su3-Ga3-4GnF6bi
Ga3-4GnF6bi su3-Ga3-4GnF6bi su
M3Gn(AF)-bi (An4Na3)-4su4-An4
M3Gn(AF)-bi
(An4Na3)-4su4-An4
M3GnLx-bi su
in Uromodulin
a brain glycan
Mosses contain structures with methyl groups [13], non-vertebrates contain numerous “unusual”
and remarkable structural features such as methylation, sulfation, and zwitterionic non-sugar
substituents, again glucuronic acid and often unusual architectures such as substituted core-fucose
just as an example [14,15].
me6-MMXF3
me
(me6me3)-MGnXF3 me
me
a moss glycan a moss glycan
Another box of plethora (sic!) is opened by the highly unusual and diverse N-glycans of
microalgae [16-18]. It would be possible to somehow describe also these structures, but for the
time being, it does not appear to be a pressing need.
essentially neglects the overinterpretation problem and suggests a particular structure, where
actually a range of isomers is possible as recently emphasized for the H5N4F1 composition [3].
proglycan antibody Oxford Fuc first Oxford Fuc last Elaborate Oxford
sialic Type of
Number of galactose fucose arm
Ref. acid sialic
antennae linkage linkage location
linkage acid
A2S2F c √ x x x x n.a.
Commercial sources:
a) ludger.com/product-catalogue/standards-controls
b) aspariaglycomics.com/product-category/glycan-standard/
c) agilent.com/en/product/biopharma-hplc-analysis/glycan-analysis
d) Thermo Fisher Scientific BioPharma Finder 3.0
For the expression of ideas about structure, two systems are prominent:
1) The “antibody glycan code” counting the number of galactose residues for biantennary
glycans only – sufficient for antibodies.
2a) The “Oxford code” counting number of antennae and galactose residues with fucose
first
2b) The “Oxford code” counting number of antennae and galactose residues with fucose
later
11
3a/b) The elaborate “Oxford code”. Somewhat event-related additions are used to exactly
specify structures, e.g., when alfa-Gal or N-glycolyl-neuraminic acid occur [23] or a large
number of isomers are to be named [24].
The “antibody glycan code” is a perfectly fine convention in the field of – surprise - antibodies, in
particular recombinant IgGs, where G2F actually stands for the isomer A4A4F6 and not for any of
the 40-50 possible other isobars [3].
The “Oxford code” and its variants likewise have a definite raison d'être especially when rather
large and only partially defined structures shall be named. E.g., the term 3A2SF comprises a number
of related structures that are most often not told apart by the analytical results [30,31].
When particular, exactly known N-glycan structures are to be described, the elaborate “Oxford
code” is an option, but the proglycan system appears to be a big step ahead. The following table
tries to give an overview of abbreviation systems. By no means does it claim to be comprehensive.
The readers contribution to update and complete this table is highly encouraged.
Finally, we dare to devise a decision triangle that opposes the Oxford and proglycan systems with
plain composition as the starting point and save haven.
proglycan
[1] Xyl – [2] substituents of reducing GlcNAc (3 before 6) – [3] bisecting GlcNAc
Non-sugar substituents
su sulfate
ac acetyl
me methyl
po phosphate
pc phosphocholine
pe phosphoethanolamine
References:
1. Mehta AY, Cummings RD (2020) GlycoGlyph: a glycan visualizing, drawing and naming application.
Bioinformatics 36 (11):3613-3614. doi:10.1093/bioinformatics/btaa190
2. Herget S, Ranzinger R, Maass K, Lieth CW (2008) GlycoCT-a unifying sequence format for
carbohydrates. Carbohydr Res 343 (12):2162-2171. doi:10.1016/j.carres.2008.03.011
3. Helm J, Grunwald-Gruber C, Thader A, Urteil J, Fuhrer J, Stenitzer D, Maresch D, Neumann L, Pabst M,
Altmann F (2021) Bisecting Lewis X in Hybrid-Type N-Glycans of Human Brain Revealed by Deep
Structural Glycomics. Anal Chem 93 (45):15175-15182. doi:10.1021/acs.analchem.1c03793
4. Helm J, Hirtler L, Altmann F (2022) Towards Mapping of the Human Brain N-Glycome with
Standardized Graphitic Carbon Chromatography. Biomolecules 12 (1). doi:10.3390/biom12010085
5. Gleeson PA, Schachter H (1983) Control of glycoprotein synthesis. J Biol Chem 258 (10):6162-6173
6. Staudacher E, Altmann F, Glossl J, Marz L, Schachter H, Kamerling JP, Hard K, Vliegenthart JF (1991)
GDP-fucose: beta-N-acetylglucosamine (Fuc to (Fuc alpha 1----6GlcNAc)-Asn-peptide)alpha 1----3-
fucosyltransferase activity in honeybee (Apis mellifica) venom glands. The difucosylation of asparagine-
bound N-acetylglucosamine. Eur J Biochem 199 (3):745-751. doi:10.1111/j.1432-1033.1991.tb16179.x
7. Banin E, Neuberger Y, Altshuler Y, Halevi A, Inbar O, Nir DD, A. (2002) A Novel Linear Code®
Nomenclature for Complex Carbohydrates. Trends in Glycoscience and Glycotechnology 14:127-137
8. Harvey DJ, Merry AH, Royle L, Campbell MP, Dwek RA, Rudd PM (2009) Proposal for a standard
system for drawing structural diagrams of N- and O-linked carbohydrates and related compounds.
Proteomics 9 (15):3796-3801. doi:10.1002/pmic.200900096
9. Hokke CH, Bergwerff AA, Van Dedem GW, Kamerling JP, Vliegenthart JF (1995) Structural analysis of
the sialylated N- and O-linked carbohydrate chains of recombinant human erythropoietin expressed in
Chinese hamster ovary cells. Sialylation patterns and branch location of dimeric N-acetyllactosamine
units. Eur J Biochem 228 (3):981-1008. doi:10.1111/j.1432-1033.1995.tb20350.x
10. Pabst M, Grass J, Toegel S, Liebminger E, Strasser R, Altmann F (2012) Isomeric analysis of
oligomannosidic N-glycans and their dolichol-linked precursors. Glycobiology 22 (3):389-399.
doi:10.1093/glycob/cwr138
14
11. Voshol H, van Zuylen CW, Orberger G, Vliegenthart JF, Schachner M (1996) Structure of the HNK-1
carbohydrate epitope on bovine peripheral myelin glycoprotein P0. J Biol Chem 271 (38):22957-22960.
doi:10.1074/jbc.271.38.22957
12. van Rooijen JJ, Kamerling JP, Vliegenthart JF (1998) Sulfated di-, tri- and tetraantennary N-glycans in
human Tamm-Horsfall glycoprotein. Eur J Biochem 256 (2):471-487. doi:10.1046/j.1432-
1327.1998.2560471.x
13. Stenitzer D, Mocsai R, Zechmeister H, Reski R, Decker EL, Altmann F (2022) O-methylated N-glycans
Distinguish Mosses from Vascular Plants. Biomolecules 12 (1). doi:10.3390/biom12010136
14. Paschinger K, Wilson IBH (2019) Comparisons of N-glycans across invertebrate phyla. Parasitology
146 (14):1733-1742. doi:10.1017/S0031182019000398
15. Hykollari A, Paschinger K, Wilson IBH (2022) Negative-mode mass spectrometry in the analysis of
invertebrate, fungal, and protist N-glycans. Mass Spectrom Rev 41 (6):945-963. doi:10.1002/mas.21693
16. Mocsai R, Blaukopf M, Svehla E, Kosma P, Altmann F (2020) The N-glycans of Chlorella sorokiniana
and a related strain contain arabinose but have strikingly different structures. Glycobiology 30 (8):663-
676. doi:10.1093/glycob/cwaa012
17. Mocsai R, Figl R, Sutzl L, Fluch S, Altmann F (2020) A first view on the unsuspected intragenus
diversity of N-glycans in Chlorella microalgae. Plant J 103 (1):184-196. doi:10.1111/tpj.14718
18. Mocsai R, Kaehlig H, Blaukopf M, Stadlmann J, Kosma P, Altmann F (2021) The Structural Difference
of Isobaric N-Glycans of Two Microalgae Samples Reveals Taxonomic Distance. Front Plant Sci
12:643249. doi:10.3389/fpls.2021.643249
19. Blochl C, Wang D, Madunic K, Lageveen-Kammeijer GSM, Huber CG, Wuhrer M, Zhang T (2021)
Integrated N- and O-Glycomics of Acute Myeloid Leukemia (AML) Cell Lines. Cells 10 (11).
doi:10.3390/cells10113058
20. Gaunitz S, Tjernberg LO, Schedin-Weiss S (2021) The N-glycan profile in cortex and hippocampus is
altered in Alzheimer disease. J Neurochem 159 (2):292-304. doi:10.1111/jnc.15202
21. Lee J, Ha S, Kim M, Kim SW, Yun J, Ozcan S, Hwang H, Ji IJ, Yin D, Webster MJ, Shannon Weickert C,
Kim JH, Yoo JS, Grimm R, Bahn S, Shin HS, An HJ (2020) Spatial and temporal diversity of glycome
expression in mammalian brain. Proc Natl Acad Sci U S A 117 (46):28743-28753.
doi:10.1073/pnas.2014207117
22. Barboza M, Solakyildirim K, Knotts TA, Luke J, Gareau MG, Raybould HE, Lebrilla CB (2021) Region-
Specific Cell Membrane N-Glycome of Functional Mouse Brain Areas Revealed by nanoLC-MS Analysis.
Mol Cell Proteomics 20:100130. doi:10.1016/j.mcpro.2021.100130
23. Fussl F, Trappe A, Carillo S, Jakes C, Bones J (2020) Comparative Elucidation of Cetuximab
Heterogeneity on the Intact Protein Level by Cation Exchange Chromatography and Capillary
Electrophoresis Coupled to Mass Spectrometry. Anal Chem 92 (7):5431-5438.
doi:10.1021/acs.analchem.0c00185
24. Abrahams JL, Campbell MP, Packer NH (2018) Building a PGC-LC-MS N-glycan retention library and
elution mapping resource. Glycoconj J 35 (1):15-29. doi:10.1007/s10719-017-9793-4
25. Echeverria B, Etxebarria J, Ruiz N, Hernandez A, Calvo J, Haberger M, Reusch D, Reichardt NC (2015)
Chemo-Enzymatic Synthesis of (13)C Labeled Complex N-Glycans As Internal Standards for the Absolute
Glycan Quantification by Mass Spectrometry. Anal Chem 87 (22):11460-11467.
doi:10.1021/acs.analchem.5b03135
26. Planinc A, Bones J, Dejaegher B, Van Antwerpen P, Delporte C (2016) Glycan characterization of
biopharmaceuticals: Updates and perspectives. Anal Chim Acta 921:13-27.
doi:10.1016/j.aca.2016.03.049
15
27. Mittermayr S, Bones J, Doherty M, Guttman A, Rudd PM (2011) Multiplexed analytical glycomics:
rapid and confident IgG N-glycan structural elucidation. J Proteome Res 10 (8):3820-3829.
doi:10.1021/pr200371s
28. Guile GR, Rudd PM, Wing DR, Prime SB, Dwek RA (1996) A rapid high-resolution high-performance
liquid chromatographic method for separating glycan mixtures and analyzing oligosaccharide profiles.
Anal Biochem 240 (2):210-226. doi:10.1006/abio.1996.0351
29. Doherty M, Bones J, McLoughlin N, Telford JE, Harmon B, DeFelippis MR, Rudd PM (2013) An
automated robotic platform for rapid profiling oligosaccharide analysis of monoclonal antibodies
directly from cell culture. Anal Biochem 442 (1):10-18. doi:10.1016/j.ab.2013.07.005
30. Clerc F, Reiding KR, Jansen BC, Kammeijer GS, Bondt A, Wuhrer M (2016) Human plasma protein N-
glycosylation. Glycoconj J 33 (3):309-343. doi:10.1007/s10719-015-9626-2
31. Williams SE, Noel M, Lehoux S, Cetinbas M, Xavier RJ, Sadreyev RI, Scolnick EM, Smoller JW,
Cummings RD, Mealer RG (2022) Mammalian brain glycoproteins exhibit diminished glycan complexity
compared to other tissues. Nat Commun 13 (1):275. doi:10.1038/s41467-021-27781-9