Cognitive Shortcuts

Why using “‍big data‍” to solve problems frightens me.

In my experience, almost all native-born citizens of my country know how to correctly use huge amounts of grammatical rules that almost no native-born citizen of my country can explain to anyone or name or describe or even recognize as true if they hear it described to them. To take just one example, almost all of them will correctly intone “‍would you like ketchup or mustard?‍” and “‍would you like that for here or to go?‍” differently, but almost none of them even know we have different kinds of “‍or‍” phrases, how those two differ, what the other common kinds are, or even that we have semantically-important tones in English.

I believe this lack of cognitive awareness of present practical knowledge is because of the method of skill acquisition. English grammar is learned in the US of A almost exclusively through what it is currently in vogue to call “‍big data.‍” Big data learning is arguably the simplest, lowest form of learning there is: we show you a huge number of examples and you do what the majority of them did. When you are done you have a mental rule for selecting behavior that does not have a “‍why‍” attached to it. It just is. It’s just “‍the right thing.‍” It’s “‍how you’ve always done it.‍” Anything else “‍feels wrong.‍” And most of the time you don’t even know it is a rule at all; it is completely subconscious.

There is also a very different kind of knowledge. It is the kind that is taught; the kind that works off of a hierarchy of simple, structured rules. And it is also a kind we are very big on in some aspects of learning here in the US of A. We start colors that way, for example, teaching every toddler out there to see and say “‍green‍” and “‍blue‍” and “‍pink‍”… but then we stop, letting them pick up “‍taupe‍” and “‍cerise‍” on their own and never even bothering to mention or acknowledge “‍83.2% 583nm, 6.7% 701nm, and 10.1% 643nm‍” at all. And integers: we are really big on integers, on counting and arithmetic and using those tools to deduce that 11 is about half of 23 instead of just looking at 11 things and 23 things and saying “‍about half as many‍”, which we could do with big-data-style training if we cared to do so. Arguably, school is all about this other kind of knowledge.

Until about ten years ago, we could say the following with reasonable confidence:

Animals only learn the big-data way.
Computers only learn the school-knowledge way.
Humans are the only things that can do both.

I’m not offering this as any kind of judgment on the two kinds of learning, on animals, on computers, or on humans. It’s just what seemed to be true of the universe. And just seemed; perhaps animals can learn hierarchical knowledge and I just didn’t learn of their learning that way.

Now, I liked this limitation on computers. It meant in practice this very useful thing: every decision every computer made was based on a set of rules designed and given to the computer by a human. You couldn’t set a computer to do a task you didn’t understand well enough to do yourself. You could give it tasks you weren’t skilled/fast/strong enough to do, but not ones you couldn’t understand.

That limitation was, in my mind, good on two fronts. First, it was a great driver in building what we understand. Want to have something that looks like painted jade in your CG movie? Then first you had to understand the optical physics of painted jade so you could tell your computer to replicate that physics. Second, it was a useful throttle on trusting computers too much. I trusted a computer to drive my car only if I trusted a human to have described all of the rules that impact driving correctly—in other words, I didn’t trust it. No one did; we couldn’t define the rules. Only a task we had solved could be given to a computer.

Then came big data and deep learning and associated technologies. Now, these are not enough to make a computer like a brain. Brains are built to learn, and learn at a level and constancy that far outstrips anything any current or rumored technology can dream of; but these technologies do allow a limited, targeted acquisition of grammar-rule-like knowledge: practically actionable understanding of what to do in various circumstances with no underlying structure at all.

So now I can have a computer that has a large bank of practical knowledge and a large but incomplete set of structured rules. Isn’t that what humans have too? Do I now trust my computer to do whatever humans do?

Not even close.

Humans can realize. They can take that soup of unstructured practical experience and that unfinished framework of theoretical knowledge and metacognate on them, using one to fill in holes in the other and creating brand-new theoretical knowledge out of thin air. There is no boundary, nothing out-of-scope.

A computer does exactly what it was programmed to do. It “‍learns‍” only what we told it to learn, and only the way we asked it to do so. It builds practical rules out of the specified set of experiences (and no others); combines them with the given set of organized knowledge as programmed (and in no other way); it might look for new data, but only in the way requested, meaning only in ways that the programmer could think of for it to look for it in advance, and then can only use whatever it finds in whatever way the programmer told it to use it. A computer will never say “‍that tree branch could have fallen in the road instead of beside it, and I don’t know what to do if it did; what is the right decision there?‍” Well, of course it can, but only if some programmer told it to do so by giving it a pre-specified set of questions it can ask. The point is there has to be that set, in advance and fixed.

Now, some people say that real intelligence will be achieved by putting in that fixed set of questions to ask this single easy-to-state-in-English question: “‍what other questions should I ask?‍” But I have never seen even a tenuous beginning of an outline of a way to pose that question to a computer. The closest we have ever come is “‍randomly generate a new program and see if it is ‘‍better‍’‍”—that is, mimic macro evolution—but that model has two huge flaws. First, like macro evolution itself, the set of possible changes is so huge and the set of useful changes so tiny that we’ve never witnessed a non-trivial survival-enhancing mutation without human intervention. Second, unlike biology where we can assert natural selection to determine which mutant is “‍better‍”, in a computer “‍better‍” has to be the result of a programmatic fitness function. Those turn out to be quite hard to write and very limiting on the form of the resulting programs.

That said, I do trust computers a lot. They take the error that is inevitably connected to creativity¹¹mostly to creativity’s flip-side, boredom out of the equation. I trust them to do exactly what we tell them to do. Except, when what we told them to do was “‍learn whatever it is this data suggests‍” and they say “‍OK‍” and you say “‍great, what did you learn?‍” and they say “‍whatever it is that that data suggested‍”… I’m not sure what I trust them to do?

Do I trust big-data-trained programs? They make usually-good decisions (which they can’t explain) based on lots of experience and no instruction, limited by the (usually undisclosed) structure imposed by the programmer. What’s not to trust?