Is Machine Learning a Trick, or is Learning Trickier than We Think?

James Somers' recent Atlantic profile of Douglas Hofstadter, "The Man Who Would Teach Machines to Think," highlights the distinction between machine learning and human learning, but is this distinction as stark as Hofstader and others seem to believe?


As Somers points out, artificial intelligence's (AI) founding father, Hofstadter, and the field's current giants disagree on the best way to pursue AI. However, they do agree that the way computers "learn" to perform some of big data's biggest tricks (machine translation, facial recognition, etc.) is nothing like the way human minds actually learn. Dave Ferrucci, for instance, one of the developers behind IBM's Watson, tells Somers: “Did we sit down when we built Watson and try to model human cognition? ... Absolutely not. We just tried to create a machine that could win at Jeopardy.”

While Hofstadter has struggled for decades to understand and reproduce human cognition, Somers reports, AI has made its biggest gains on "tricks" that substitute mountains of data for modeling actual learning. Effective machine translation, for instance, didn't come from teaching computers the complex linguistic rules of different languages. Rather, Somers explains, it came from matching strings of text with millions of translations that were already done by humans and then using the settings that produced the most consistently successful translations.

As Somers puts it, "That’s what makes the machine-learning approach such a spectacular boon: it vacuums out the first-order problem, and replaces the task of understanding with nuts-and-bolts engineering."

For Hofstadter, however, this marks contemporary AI as little more than a parlor trick. “I don’t want to be involved in passing off some fancy program’s behavior for intelligence when I know that it has nothing to do with intelligence," he tells Somers. "And I don’t know why more people aren’t that way.”

Instead of mining big data to mimic learning then, Hofstadter has set up the Fluid Analogies Research Group (FARG). Here, the article reports, Hofstadter works with graduate students, who spend five to nine years, "turning a mental process caught and catalogued in Hofstadter’s house into a running computer program."

When Hofstadter attacks the problem of developing a program that can solve word jumbles, he doesn't simply instruct the computer to match scrambled words to real words with the same letters--a programming task that only took Somers four minutes. Hofstadter spends two years thinking about how his mind approaches the task of unscrambling a word jumble and then programs a computer to do that. As Somers reports:
 'I [Hofstadter] could feel the letters shifting around in my head, by themselves,' he told me, 'just kind of jumping around forming little groups, coming apart, forming new groups—flickering clusters. It wasn’t me manipulating anything. It was just them doing things. They would be trying things themselves.' The architecture Hofstadter developed to model this automatic letter-play was based on the actions inside a biological cell.

Somers' piece does a great job of explaining the rift between Hofstadter and contemporary AI developers, but I can't help but wonder if this rift isn't somewhat artificial, unproductive, and unnecessary. That is, how do we know that our minds don't actually work a little bit like Watson and other big data tools? Consciously, we might feel that when we attempt to unscramble a word, the letters are re-arranging, but how do we know that subconsciously our brains aren't comparing the letters to a words in a mental dictionary? From my experience, I know there are times when the solution to a word jumble jumps to mind as if from no where, and I don't think it's too far fetched to believe something like this might be going on behind the scenes. At the very least, it's worth considering.

Another example to examine here is the way children learn to read. Phonics, no doubt play a role. Consciously knowing the rules for matching letters with sounds and then building those sounds into words helps, but there's also a lot of whole-word learning that just comes from exposure to mountains of data--which we call experience (or "time on task"). For some readers, no matter how much we try to break down the process into discrete cognitive steps, literacy only comes after exposure to lots and lots and lots of words (big data).

Language acquisition is similar. Why is it that classroom learning and its focus on grammatical rules can only take a foreign language learner so far? Why is it that immersion (exposure to heaps of language data and trial-and-error experimentation) is still the best way to master a foreign tongue?

Much of learning still happens subconsciously through long-term experience with the task at hand, and it's possible that what our brains are doing is comparable to the big-data tricks Hofstadter derides, or some interaction between these tricks and the process we consciously label cognition. Perhaps a method for true machine learning (or deep learning) won't come from abandoning big-data "tricks" for Hofstadter's focus on cognitive process, but rather from a combination of these two strategies.

More Things