What is a Character?
© 23 Feb 2012 Luther Tychonievich
Licensed under Creative Commons: CC BY-NC-ND 3.0
On characters, letters, and glyphs.


One of the questions that faces font designers which typographers are generally free to ignore is the distinction between letters, characters, symbols, and glyphs. I’m using a lot of less-common unicode in this post. My apologies to those who have non-unicode compliant fonts. Letters are the most general (the platonic ideal, if you will), while glyphs are the most. Thus ℝ, ℜ, ℛ, R, r, and r are all the same letter and each distinct glyphs. Typically R and r are considered distinct characters but r and r are the same.

But the boundaries get messy. Is & the two characters et or a single character? How about German’s ß (which might be ss), Hebrew’s צ (which, at the end of a word, looks like ץ) or Arabic’s distinct solo, opening, middle, and ending forms causing ه, when repeated thrice, to look like ههه?

The reason this all matters for computers is a matter of encoding. The de facto standard way of storing text digitally is with a sequence of numbers, each representing a “‍character‍”, combined with special mode-change numbers that switch between bold and italic and so on and a set of rules for mapping those characters into glyphs. The Hebrew group at the Unicode consortium decided that צ and ץ should be separate charaters so I can type both צץ and ץצ if I want to; but the Arabic group decided there’s only one ه and that it should be drawn with different glyphs in different contexts, like ه هه ههه; there is no easy way to get these glyphs outside of the word context.

It gets even stranger in mathematical texts. For example, normally we’d think of ℝ, ℜ, ℛ, and R as being the same character in different typefaces, but in mathematics each has a unique meaning: ℝ is the set of real numbers, ℜ extracts the real part from a complex number, ℛ denotes a Reimann integral, and R is a variable. The first three of these symbols they had included in Unicode as distinct characters; the last one the left as an R in an italic typeface.

There is not one way to group glyphs into characters and split letters into characters. Instead, the decision is more one of interface (keyboards) and encoding (files).

