Unlocking Programming: Programming Languages
© 5 Mar 2012 Luther Tychonievich
Licensed under Creative Commons: CC BY-NC-ND 3.0
other posts

Unlocking Programming

Four meanings of “‍a language.‍”

 

People often speak of “‍programming languages‍”. When they do so, they mean (at least) four different things which are (for some reason) almost always shipped together.

Syntax

The first and most obvious characteristic of a language is its syntax. Syntax is just the particular mapping between low-level programming constructs and textual representations. For example, what in Scheme is written as (print (+ 2 3)) is written in C as print(2 + 3) instead. Some people argue with passion about nuance of syntax; indeed, because annoying syntax annoys with every line of code I’ve been known to vent syntactically myself. Still, in the end of the day it’s just some simple rules to memorize.

Translating between language syntax is mostly trivial. Sometimes there are constructs in one that have no parallel in another, but for the most part syntax is interchangeable.

Semantics

The next layer is the semantics of a language. These are less-obvious but all-pervasive definitions answering “‍what do we mean by X?‍” When I have a named valueConsider “‍let x be 17‍”. can I change it later?Is “‍set x to 3‍” allowed? Can I change it to a different type?Is “‍set x to Blue‍” allowed? If we put an expression into a subroutine invocation does the expression get evaluated up front or only when needed?

This last point has a lot of nuances: for example, if we have

how to compute “swizzle x and y”
if x ≥ 0 { the answer is 7 } otherwise { the answer is x + (x × y) }

then what happens when we invoke “‍swizzle 3 and 4 ÷ 0‍”? If we evaluate the expressions first we get an error (4 ÷ 0 doesn’t make sense); if we delay the evaluation we get 7 instead, never needing y. But if we delay the evaluations and invoke

let z be 2; swizzle (increase z by 1 and return 3) and z

is the answer 3 + (3 × 2), 3 + (3 × 3), or 3 + (3 × 4)?

I could drone on at much more length about semantics choices and how they impact programming, but the point is semantics aren’t visible when looking at a program but have significant impacts how the program behaves.

Differing semantics makes languages difficult to interchange. However, thanks to the Church-Turing thesis, you can always find a workaround. Just as some Japanese words need to be translated as longer phrases in English, so some simple statements in one language require more work to express in another. There are even some that are so different they require emulation This prevalence of inbuilt emulators leads to such quips as Greenspun’s tenth rule. rather than translation, thousands of lines of code needed to mimic a single behavior.

Library

Each programming language ships with some kind of standard library, a (sometimes extensive) set of subroutines and abstract data type implementations that a programmer can use to simplify the task of writing new programs. For most languages, these are augmented by several “‍third-party‍” libraries written by other people to fill in the gaps left by the standard library. I hear software engineers discuss the pros and cons of libraries far more often than I hear them discuss syntax or semantics.

Frankly, there is not much to say here in generalities. There is no reason that any library couldn’t be ported to any language. However, doing so requires work, and that means relatively few languages have large, well-supported libraries.

Runtime

The Church-Turing thesis tells us that any problem may be solved in any language. Everything that thesis ignores can be summed up in the runtime environment of the language. Sure, once we have some information in the program we can compute whatever other information we want; but how do we get the information in the first place, and where can we send it once we’ve computed it? How long does it take to do the computation? Can I do things in parallel, in serial, or both? How is memory handled, and what happens when I use too much of it?

At one extreme on this spectrum we have systems programming languages like C, D, and Ada. The basic answer they give is “‍whatever the operating system and hardware allow.‍” While they provide abstractions that cover this level in most instances, with a systems language you can send any signal to any wire on your chip. Systems languages have the most scope for doing good and ill: I can use them to interface with devices yet undreamed, but if I send the wrong signals to the wrong wires I can destroy data and cause permanent physical damage to hardware.

At the other extreme are the scripting languages like Javascript, Matlab, and Visual Basic. These have a very limited set of interfaces built into their standard library (which usually interface with other programs written in C) and besides these there is no scope for any other input or output. Scripting languages are limited, but they are also safe: with few and well-guarded gates, I’m free to do any stupid thing I want without worrying about harming more than my own computation. Of course most have not-so-well-guarded gates, leading to all kinds of viruses and bugs…

There is a trend for languages with systems-level runtimes to be compiled into “‍machine code‍”, and hence run efficiently; script-level runtimes are associated with “‍interpreted‍” languages that are really input to other programs which “‍run‍” them, and more often run inefficiently but are succinct to write. As far as I know, there is no reason this distinction needs to exist.

The only way around the limitations of a runtime is through language extensions, modules written in a lower-level runtime language that expose new functionality to the higher-level runtime target language.




Looking for comments…



Loading user comment form…