Unlocking Programming: Composite Datatypes
© 6 Sep 2011 Luther Tychonievich
Licensed under Creative Commons: CC BY-NC-ND 3.0
other posts

Unlocking Programming

Part of a series of posts explaining programming for the lay-person.

 

This is part of a series of posts; see the introduction to this series. This post builds on the idea of types.

Defining new types

Most programming languages come with a small set of built-in or primitive types. Most often these are 8-, 16-, 32-, and 64-bit binary integers; a boolean type with values true and false; and 32- and 64-bit floating-point, or non-integral, numbers. Usually all this provides is some sort of numbers, typically with a limited range of expression. More recently, many languages also include built-in string and unicode types for text and textual glyphs.

The reason so few basic types are provided is because you can make anything from these parts. Want complex numbers? They’re just two real numbers handled a particular way. Want text? Just number all the glyphs in your language and treat each word as an ordered list of numbers. If you want to represent people in your program, chances are you have a list of things you need to know about the people and that you can eventually break each of those things down into numbers.

Re-using Others’ Definitions

Most languages come with what’s called a standard library which includes a large number of pre-made types as well as many subroutines that cover a lot of the common tasks programmers find themselves doing. Typically the entire language can be described in a few pages, but the standard library could often fill multiple books. Fortunately, you don’t need to know all that material: you can look it up when needed or rewrite it yourself.

Contiguity: Arrays and Records

Most languages provide three basic ways of assembling new types: arrays, records, and pointers. Some languages also offer sum-types, enumerated types, nullable types, polymorphic versions type X, and on and on…. Programming language theorists really like playing with types. Arrays and records I describe here; pointers I’ll cover tomorrow.

Arrays and records are both very common in part because of how computer memory works. Memory is laid out as a huge number of little numbered bins called bytes, where each byte can hold an eight-bit number, 0 to 255 That’s 00000000 to 11111111 in base 2 or 00 to ff in base 16. Base 16, a.k.a. hexadecimal, is widely used in computing and typically written with a preceding 0x, as 0xf81a = 165×15 + 162×8 + 16×1 + 10 = 63514. . Computers can de-reference a number by looking up the bin it numbers and grabbing the value out of it (and, usually, a few of its neighboring bins).

Because of this layout, it is very easy for the computer to store things contiguously in memory. If I say a complex number is two eight-byte real numbers, the computer just puts the bytes of one number right after the bytes of the other. A complex number now takes up sixteen bytes and we’re done.

Records (also called structs) work by simply listing the subtypes that make up a type; for example, “‍a flub is two booleans, a 64-bit number, and four 16-bit integers‍”. Typically languages require you to name each of these pieces “‍a flub is alphonso, a boolean; maud, a boolean; dana, a…‍” and access them only by name (“‍my flub’s maud‍”) so that the language can figure out the bins you’re asking for in advance. “‍my flub’s maud‍” is in bin number “‍my flub‍” + 1

Arrays also put a lot of values in adjacent bins in memory, but they only store one kind of thing and the length isn’t part of the type. Thus “‍an array of integers‍” has anywhere from 0 to all of memory filled with integers. If x is an array of integers then x[17] is (in most languages) the 18th integer, because almost all modern languages start counting at 0. x[17], or “‍x’s 18th thing‍”, is in bin number “‍x‍” + (17 × bin used by each integer)

An Aside: Units and Algebraic Structure

There are two notions of type that exist in the real world but have not (yet) made much headway in computing. One is units: no mainstream programming language I know allows you to say “‍3 inches + 8.8 meters‍” or to define new units like “‍a league is (1 hour × walking speed)‍”. The other is algebraic structures, like groups, fields, and rings, though that is probably a little nuanced for this tutorial.

My goal in this aside is to note that programming languages generally support only a few primitive types and composites built thereon. There are plenty of other things a type could be, but composites have proven sufficient for enough programming so far that they are generally the only types languages include.




Looking for comments…



Loading user comment form…